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Bridge research and impacts 


Tracking societal impacts encourages academics to pursue them. The launch of three new Nature 


journals should also help. 


of fundamental science. To make progress, one must take persis- 

tence by researchers, mix in patient financial support and then 
add creative imagination and logic (important for creating hypotheses 
and testing predictions). Then sprinkle on some unpredictable out- 
comes and stew for a century, or perhaps even longer. 

The 2016 announcement of the detection of gravitational waves is 
a fine product of this recipe for success. It was borne of theories of 
relativity that were esoteric but which now, unforeseeable at the time 
of their origin in 1916, underpin technologies such as global naviga- 
tion. Readers of Nature probably have their own favourite examples of 
such success stories. 

Support for fundamental research remains essential, both as a 
signal of cultural values and as a driver of future societal progress. But 
research witha shorter-term or more-local vision of practical outcomes 
deserves reward and prestige, too — a fact perhaps taken for granted 
by engineers or clinical scientists, but less so in some other disciplines. 

Take, for instance, the way in which regulatory authorities, 
commercial organizations and physical geographers at the University 
of Leeds, UK, collaborated to boost water quality and company perfor- 
mance by developing innovative catchment-management strategies in 
the north of England. Another example is how local health authorities 
partnered with a digital-media-production company to disseminate 
content related to a self-help technique developed by psychiatry 
researchers at King’s College London to combat bulimia. 

Both these examples are included in a database of case studies 
collected by the Higher Education Funding Council for England in its 
pioneering 2014 Research Excellence Framework (REF; see go.nature. 
com/2zags87). The council assesses the impact of research retrospec- 
tively, and rewards high performers with extra funds. This approach has 
increased financial support for some universities that pursue ‘useful’ 
research, but that did not fare well in previous, more-traditional fund- 
ing frameworks. The next REF, which will be conducted in 2021, will 
allocate more weight (25% up from 20%) to impact assessments — a 
move that Nature supports. Other funders have signalled that they 
believe in direct impact, and demand a prospective view of such 
benefits in funding applications. 

The database of REF case studies is interesting partly because it high- 
lights straightforward ways of documenting impacts through explicit 
description and endorsement by researchers’ partners in delivery, and 
partly because it reveals the variety of pathways to impact. 

Association with delivery partners and impact brings recognition and 
prestige, and so does the funding that such case studies help a univer- 
sity to acquire. Applying impact criteria in retrospective studies is not 
straightforward, given that real-world change may take years to occur 
(although where software or digital apps are concerned, progress can be 
faster). But such analyses can inform researchers and help them to antic- 
ipate and establish partnerships at the outset to boost eventual impact. 


r here is a classic narrative that stresses the importance and value 


Impact can also depend on the dissemination of results — and we 
hope that Nature journals can help. Over the past few years, the Nature 
group of journals has developed to include multidisciplinary and proac- 
tively interdisciplinary journals specifically aimed at societal challenges, 
as well as at fundamental research across the relevant disciplines. Nature 
Climate Change was the first, and more recent launches include Nature 
Energy, Nature Human Behaviour and Nature Biomedical Engineering. 
Next week, we launch Nature Sustainability, 


How might Nature Electronics and Nature Catalysis. (This 
research is not to ignore recent journals in more con- 
journals that ventional disciplines including microbiology, 
seek to make astronomy and ecology and evolution.) 

research relevant Journals that target societal issues typically 
add value? grapple with an unusual issue for academic 


publishers: how to assess the significance of 
research that claims potential utility outside academia. 

Sometimes, resolving this issue is relatively straightforward. In 
some strands of electronics and catalysis, for example, the academic 
and industrial communities are well connected, share goals and have 
clear, agreed pathways to the application of knowledge. So the potential 
impact — and thus the broader significance — of a paper that claims 
an application can be readily evaluated. 

In other areas of research, methods of judging potential impact might 
not be so established, and this makes it difficult to assess and referee a 
paper. For example, when considering a paper that cites policy relevance 
as a key claim to significance, a technical assessment alone will not 
suffice. To find suitable referees, editors might scan the literature, com- 
mittee memberships, academic societies and specialist journalism to 
find individuals who can separate genuine policy value from delusions. 

The challenge requires editors to be open-minded and also to enlist 
referees who can recognize the value in papers whose conceptual 
novelty might be low but whose impacts can be high — for example, 
because ofa step-change in functionality of an application. 

In Nature journals, the ultimate responsibility of selecting which 
papers to publish lies with the editors — not with referees, not with 
external editorial boards. Is the decision-making therefore subjective? 
No more so than decisions in fundamental science can be, where the 
significance is not immediately obvious. The quality of advice is what 
counts, alongside the breadth of experience and outlook of the editors. 

Beyond the care and innovation needed in the refereeing, and the 
publication of good papers, how might research journals that seek 
to make research relevant add value? One way could be to help dis- 
seminate the impacts that followed research. Alongside citation and 
altmetric analyses, journals could publish narratives by researchers of 
what happened next, validated by testimonials from their partners or by 
other concrete evidence. Historians could apply this approach to much 
older papers — including those of past greats. What a richer, livelier 
and more impactful literature that would be. m 
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WORLD VIEW .jernsicnoen 


President Donald Trump declared that the next time US astro- 

nauts blast off, they will be headed to our rocky satellite. In 
September, the European Space Agency made its strongest call yet for 
the installation of a permanent, human-inhabited village at the lunar 
southern pole. China’s National Space Administration is pursuing a 
human outpost there, among other lunar projects, and private entrepre- 
neurs are enthusiastic about mining minerals on the Moon and making 
rocket fuel for further space exploration. 

But these initiatives are more technical and economic than scientific. 
Unless we start planning now, they will lack an exceptional asset — a 
lunar radio telescope. This would be uniquely poised to answer one 
of humanity's most profound questions: what are our cosmic origins? 

The far side of the Moon is the best place in the inner Solar System 
to monitor low-frequency radio waves — the only 
way of detecting certain faint ‘fingerprints’ that 
the Big Bang left on the cosmos. Earth-bound 
radio telescopes encounter too much interfer- 
ence from electromagnetic pollution caused by 
human activity, such as maritime communica- 
tion and short-wave broadcasting, to get a clear 
signal, and Earth's ionosphere blocks the longest 
wavelengths from reaching these scopes in the 
first place. We need these signals to learn whether 
and how the Universe inflated rapidly in the first 
trillionth of a trillionth of a trillionth of a second 
after the Big Bang. 

To be sure, observations from Earth and 
orbiting satellites are impressive. The Sloan Digital 
Sky Survey, run by over a dozen collaborating 
institutions, has mapped more than a million 
galaxies, and larger surveys under way could identify up to ten billion. 
But these galaxies formed millennia after inflation occurred. 

The key to understanding early events in the Universe are the relics 
they left behind. One is a sea of electromagnetic radiation coming 
from every direction in the sky. Released around 380,000 years after 
the Big Bang when the first atoms formed and the Universe was much 
hotter, this radiation cooled over time to microwave frequencies, and 
is now known as the cosmic microwave background. 

Superimposed on this background are patterns from scattered 
photons: vestiges of the gravitational wells that seeded galaxies and 
other massive structures in the Universe. Studies from Earth-bound 
telescopes and orbiting satellites have mapped millions of these tiny 
ripples to produce precise estimates of the age of the Universe, rates 
of expansion and the relative amounts of visible matter, dark matter 
and dark energy. In December, one team won the US$3-million Break- 
through Prize in Fundamental Physics for their efforts towards this goal. 
But these projects cannot robustly detect the predicted fingerprints of 
inflation — skewed ‘twists’ in these ripples. To do that, we must find the 


Pr= to return to the Moon are getting serious. Last month, US 
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MOON-BASED 


SCOPES COULD 
YIELD 


FANTASTIC 


IMAGES OF THE 
OLDEST 


GALAXIES. 


“d Put telescopes on the far 
, ( side of the Moon 


Current proposals for lunar development neglect our best chance to glimpse 
the beginnings of the Universe, says Joseph Silk. 


signals that have travelled the farthest in our expanding Universe, and 
so represent the ‘dark ages’ — the first few hundred million years after 
the Big Bang, before the first stars formed. To gain the needed precision, 
we must look beyond the billions of observable galaxies to their building 
blocks: trillions of clouds of hydrogen gas. 

In 1944, Dutch astronomer Hendrik van de Hulst theorized a way 
of detecting cold interstellar atomic hydrogen on the basis of a slight 
energy change in the atoms at a frequency of 1420.4 megahertz (MHz), 
a wavelength of 21.1 centimetres. This is now widely used to map the 
gas clouds between nearby stars. The same principle could let us map 
extremely remote hydrogen clouds, because inflation imprints a tiny 
distortion on the clouds distribution — called ‘primordial nongaussi- 
anity’ — shadowed against the cosmic microwave background. It is the 
only certain signal from the beginning of the Universe. 

But these subtle distortions of 21-centimetre 
radio waves from dark-age hydrogen clouds 
cannot be detected by current instruments on 
Earth. The distant signals are stretched by the 
Universe's expansion to a much lower frequency of 
30 MHz, where Earth’s ionosphere and terrestrial 
communications render signals unacceptably 
noisy. Only from the far side of the Moon — with 
no ionosphere and shielded from Earth-related 
interference — could we spot these dim shadows. 
This is where we could verify or falsify theories of 
inflation and assess whether scientists have settled 
on too simple a model of the Universe’ early stages. 

A radio array able to capture these data would 
probably use millions of simple radio antennas 
deployed over an area a hundred kilometres across 
on the Moons far side, operated by humans and 
robots. Infrared telescopes of unprecedented scale could be built in cold 
craters near the lunar south pole, in permanent shadow where tem- 
peratures as low as 30 kelvin have been measured. With no atmosphere 
to absorb radiation and block signals, Moon-based scopes could yield 
fantastic images of exoplanets and the oldest galaxies in the Universe. 
Using the Hubble Space Telescope and the International Space Station, 
launcher included, as guides, I estimate that all these telescopes would 
cost no more than 5% of other planned lunar operations. 

Current proposals neglect the unique opportunity that a Moon- 
based telescope offers. Astronomers, ESA and NASA should develop 
the concept and promote the idea now, while lunar plans are still in 
their infancy. Rocket fuel from Moon ice and dollars from space tour- 
ists are grand. But if we really want to challenge the limits of human 
exploration, we should seek the beginnings of the Universe. = 


Joseph Silk is a professor of astronomy at Johns Hopkins University in 
Baltimore, Maryland, and at the Institute of Astrophysics, Paris. 
e-mail: jsilk@jhu.edu 
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SEVEN DAYS sescnin 


POLICY 


China emissions 
China announced on 

19 December that it will 
press ahead with a national 
carbon-trading system 

to limit greenhouse-gas 
emissions, despite delays 

in its implementation. The 
system will require polluters 
to pay to emit carbon dioxide, 
and will initially cover more 
than 3 billion tonnes of CO, 
that are emitted each year by 
the country’s power plants. 
That would make it the 
world’s largest carbon market, 
almost double the size of the 
European Union's emissions- 
trading system. China had 
hoped to launch the national 
scheme last year, but officials 
have yet to set a launch date. 


Gene therapy 

The US Food and Drug 
Administration has approved 
for the first time a gene 
therapy for a disease caused by 
mutations in a specific gene. 
The decision, announced on 
19 December, will allow Spark 
Therapeutics of Philadelphia, 
Pennsylvania, to market 

the treatment, voretigene 
neparvovec-rzyl (Luxturna), 


TREND WATCH 


France increased its number of 
compulsory infant vaccinations 


to people with a rare hereditary 
blindness. Luxturna is a 
modified virus that is injected 
into the eye to deliver a correct 
copy of the mutated gene. 

The healthy gene instructs 

cells in the retina to produce 

a protein that allows them 

to respond to light. 


Science rebuffed 
Australian Prime Minister 
Malcolm Turnbull has 
eliminated a cabinet-level 
government ministry 

for science, leaving the 
country without a science 


PEOPLE 


Neuroscientist dies 


US neuroscientist Ben Barres 


minister for only the (pictured) — known for 
second time since 1931. his pioneering studies of 
Turnbull announced the brain cells called glia, and 


move, along with the creation 
of a lower-level science 
ministry, on 19 December 

as part of a broader cabinet 
shake-up. Michaelia Cash, 
the former acting minister 
for industry, innovation and 
science, becomes minister 
for jobs and innovation. Zed 
Seselja is the new assistant 
minister for science. Some 
researchers have said that the 
decision is inconsistent with 


for championing diversity 

in academia — died on 

27 December, aged 63. 
Barres laboratory at Stanford 
University in California 
showed that glial cells — 
non-neuronal cells that are the 
most numerous cell type in 
the brain — had a central and 
previously unappreciated role 
in supporting crucial neural 
circuits in the brain. Born 
Barbara Barres in 1954, he 


the government stated transitioned genders in 1997. 
commitment to putting Barres campaigned hard for 
science at the centre of equal opportunities in science 
policymaking. for women, minorities and 


FRANCE BOOSTS COMPULSORY INFANT VACCINATIONS 


France has tightened its immunization rules, but it has one of the 
highest rates of vaccine distrust in the world. 


Attitudes towards vaccination among French population (%) 


early-career researchers. He 
was diagnosed with pancreatic 
cancer in 2016. See go.nature. 
com/2cafvpy for more. 


Librarian freed 


An appeals court in Egypt has 
overturned a prison sentence 
against Ismail Serageldin, 

the retired founding director 
of the country’s renowned 
Alexandria Library. Last July, 
Serageldin was found guilty of 
negligent management of the 
library, and sentenced to three 
and a half years in jail. Many 
considered the allegations 
politically motivated, and an 
international campaign was 
launched to free him. The 
charges were dismissed in a 
hearing on 26 December. The 
original Alexandria Library 
was created around the fourth 
century BC, when the city was 
the intellectual centre of the 
Hellenic world. It burnt down 
six centuries later. In 2001, 
Serageldin returned to Egypt 
from abroad to rebuild it in its 
modern form. 


NASA competition 
On 20 December, NASA chose 
two missions as finalists for its 
latest planetary-exploration 
programme. One, the Comet 
Astrobiology Exploration 
Sample Return (CAESAR) 
mission, would retrieve 
material from comet 67P/ 
Churyumov-Gerasimenko, 


STANFORD SCHOOL OF MEDICINE 


: Very f™ Somewhat jy Somewhat = gj Very : 

from 3 to 11 on 1 January, in the favourable favourable = unfavourable —_ unfavourable which the European Space 
face of rising public distrust of Agency’s Rosetta spacecraft 
vaccines and health authorities. 2009 ale aS orbited and landed on between 
Public surveys show that France 2014 and 2016. The other 
has one of the lowest levels of 2005 42.3 477 would fly to multiple locations 
confidence in vaccines in the on Saturn’s moon Titan to 
world. Immunizations against 2010 sample its chemically complex 
diphtheria, tetanus and polio are surface and atmosphere. 
already compulsory; the added One of the proposals will be 

d : 2014 24.3 54.5 ; : 
vaccines are those against mumps, selected in 2019 to be built and 
measles, rubella, whooping launched in the mid-2020s. 

2016 26.3 


cough, hepatitis B, pneumonia, 
meningitis C and the Haemophilus 
influenzae bacterium. 


> NATURE.COM 
For daily news updates see: 
Www.nature.com/news 


Data from surveys of people aged 18-75. 
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Philanthropists 
pour money into high-risk 
research p.10 


some experiments with 
killer pathogens pil 


Ban lifted on 


Looking ahead (3 j 
towards scientific i 


milestones in 2018 p.l2 


Acloser 
look at chronic fatigue 
syndrome p.l4 


Retinal images could allow computers to predict a person’s risk of an imminent heart attack. 


| BIOLOGY | 


Deep learning sharpens 


views of cells and genes 


Neural networks are making biological images easier to process. 


BY AMY MAXMEN 


yes are said to be the window to the soul 
KE but researchers at Google see them as 
indicators of a person's health. The tech- 
nology giant is using deep learning to predict a 
persons blood pressure, age and smoking status 
by analysing a photograph of their retina. Goog- 
le's computers glean clues from the arrangement 
of blood vessels — anda preliminary study sug- 
gests that the machines can use this informa- 
tion to predict whether someone is at risk of an 
impending heart attack. 
The research relied on a convolutional 


neural network, a type of deep-learning algo- 
rithm that is transforming how biologists ana- 
lyse images. Scientists are using the approach 
to find mutations in genomes and predict 
variations in the layout of single cells. Google's 
approach, described in a preprint in August 
(R. Poplin et al. Preprint at https://arxiv.org/ 
abs/1708.09843; 2017), is part of a wave of new 
deep-learning applications that are making 
image processing easier and more versatile — 
and could even identify overlooked biological 
phenomena. 

“Tt was unrealistic to apply machine learn- 
ing to many areas of biology before,’ says Philip 


Nelson, a director of engineering at Google 
Research in Mountain View, California. “Now 
you can — but even more exciting, machines 
can now see things that humans might not have 
seen before.” 

Convolutional neural networks allow com- 
puters to process an image efficiently and 
holistically, without splitting it into parts. The 
approach took off in the tech sector around 
2012, enabled by advances in computer power 
and storage; for example, Facebook uses this 
type of deep learning to identify faces in photo- 
graphs. But scientists struggled to apply the net- 
works to biology, in part because of cultural 
NATURE | 9 
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> differences between fields. “Take a group 
of smart biologists and put them in a room of 
smart computer scientists and they will talk 
two different languages to each other, and have 
different mindsets,’ says Daphne Koller, chief 
computing officer at Calico — a biotechnology 
company in San Francisco, California, that is 
backed by Google's parent, Alphabet. 

Scientists also had to identify which types 
of study could be conducted using networks 
that must be trained with huge sets of images 
before they can start making predictions. 
When Google wanted to use deep learning to 
find mutations in genomes, its scientists had to 
convert strands of DNA letters into images that 
computers could recognize. Then they trained 
their network on DNA snippets that had been 
aligned with a reference genome, and whose 
mutations were known. The end result was 
DeepVariant, a tool released in December that 
can find small variations in DNA sequences. In 
tests, Deep Variant performed at least as well as 
conventional tools. 

Cell biologists at the Allen Institute for Cell 
Science in Seattle, Washington, are using con- 
volutional neural networks to convert flat, grey 


images of cells captured with light microscopes 
into 3D images in which some of a cell’s orga- 
nelles are labelled in colour. The approach 
eliminates the need to stain cells — a process 
that requires more time and a sophisticated lab, 
and can damage the cell. Last month, the group 
published details of an advanced technique that 
can predict the shape and location of even more 
cell parts using just a few pieces of data — such 
as the cell’s outline (G. R. Johnson et al. Preprint 
at bioRxiv http://doi.org/chwv; 2017). 

“What youre seeing now is an unprece- 
dented shift in how well machine learning can 
accomplish biological tasks that have to do with 
imaging,’ says Anne Carpenter, director of the 
Imaging Platform at the Broad Institute of MIT 
and Harvard in Cambridge, Massachusetts. In 
2015, her interdisciplinary team began to pro- 
cess cell images using convolutional neural 
networks; now, Carpenter says, the networks 
process about 15% of image data at her centre. 
She predicts that the approach will become the 
centre’s main mode of processing in a few years. 

Others are most excited by the idea that 
analysing images with convolutional neural 
networks could inadvertently reveal subtle 


biological phenomena, prompting biologists to 
ask questions they might not have considered 
before. “The most interesting phrase in science 
isn’t ‘Eureka!’ but “That's weird — what's going 
on?” Nelson says. 

Such serendipitous discoveries could help to 
advance disease research, says Rick Horwitz, the 
Allen Institute's executive director. If deep learn- 
ing can reveal subtle markers of cancer in an 
individual cell, he says, it could help to improve 
how researchers classify tumour progression. 
That could in turn trigger new hypotheses about 
how cancer spreads. 

Other machine-learning connoisseurs in 
biology have set their sights on new frontiers, 
now that convolutional neural networks are 
taking flight for image processing. “Imaging 
is important, but so is chemistry and molec- 
ular data,” says Alex Wolf, a computational 
biologist at the German Research Center for 
Environmental Health in Neuherberg. Wolf 
hopes to tweak neural networks so that they 
can analyse gene expression. “I think there 
will be a very big breakthrough in the next few 
years,’ he says, “that allows biologists to apply 
neural networks much more broadly.” m 


Facebook billionaire pours 
funds into high-risk research 


Silicon Valley philanthropy project revives some grants rejected by US government. 


BY EWEN CALLAWAY 


fter his plan to test a cancer vaccine for 
Aes pet dogs was rejected by 

the US National Institutes of Health 
(NIH), inventor and biochemist Stephen 
Johnston sought funding outside the main- 
stream system. On 20 December, the Open 
Philanthropy Project, a grant-giving organi- 
zation that is largely funded by Facebook co- 
founder Dustin Moskovitz and his wife, Cari 
Tuna, announced that Johnston will receive 
US$6.4 million to test the vaccine he devel- 
oped. His team at Arizona State University in 
Tempe is now poised to enrol its first pooches 
ina clinical trial. 

The science-funding efforts of the Open 
Philanthropy Project, or Open Phil, have 
so far flown under the radar compared with 
those of other Silicon Valley funders. But that 
is likely to change. The organization, which 
was launched in 2011 but rebranded under 
its current name in 2014, has significantly 
boosted its spending to $200 million this year, 
of which around $40 million went to scientific 
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research. And Chris Somerville, a biochemist 
and a scientific adviser to the organization, 
says that Open Phil’s total spending will rise 
several times over the coming years. 


Moskovitz, whose estimated net worth is 
more than $14 billion, and Tuna have said 
that they plan to give away most of their for- 
tune during their lifetimes. It is likely that, 


Dogs with cancer are about to be enrolled in a clinical trial of a vaccine for the disease. 
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in terms of impact on research, Open Phil 
will soon rival better-known philanthropy 
vehicles, such as the Chan Zuckerberg 
Initiative in Palo Alto, California, which 
among other efforts awarded $50 million 
in life-sciences grants in 2017 to create a 
biohub in the San Francisco Bay Area. 
Open Phil, based in San Francisco, 
acknowledges the high odds of failure of 
the basic research it funds and, for a private 
funder, publishes brutally honest assess- 
ments of its projects. These range from 
developing lab-made meat alternatives to 
work ona controversial genetic-engineering 
technology called gene drive. For its latest 
funding round, Open Phil asked scientists 
whose grant applications had been rejected 
by an NIH competition for risky research to 
dust off their proposals. Some 120 research- 
ers resubmitted their requests, and it 
awarded $10.8 million in total to four teams. 
“My hope is Open Philanthropy can 
make the world safe for serendipity again,” 
says Ed Boyden, a neuroscientist at the 
Massachusetts Institute of Technology in 
Cambridge, who won $3 million from the 
project in 2016. He is working to develop 
a technology that swells tissue to make it 
easier to examine under a microscope. 


TAKING A PUNT 

Gregory Timp, a biophysicist at the 
University of Notre Dame in South Bend, 
Indiana, who has won $2 million from 
Open Phil to develop a technology to 
sequence proteins, says that the evaluation 
process involved rebutting each of the NIH’s 
critiques of his proposal, as well as several 
rounds of interviews with scientist advisers. 
“They have scientific rigour couched in Cal- 
ifornia casual. Everything is informal, but 
they ask these piercing questions,” he says. 

Katherina Rosqueta, founding execu- 
tive director of the Center for High Impact 
Philanthropy at the University of Pennsyl- 
vania in Philadelphia, says that the project's 
efforts to share its extensive research and 
justify its giving makes it stand out among 
private funders. “They have a highly 
analytical view. They have an appetite and 
skill in conducting research and sourcing 
information, and they’re willing to do that 
in a public and transparent way.’ 

Many philanthropists shy away from 
basic science because the pay-offs tend to 
be long term and the risks high, says Marc 
Kastner, president of the Science Philan- 
thropy Alliance in Palo Alto, a coalition 
of foundations that advocates for private 
funding of basic science. But the Silicon 
Valley entrepreneurs who bankroll organi- 
zations such as the Open Philanthropy 
Project and the Chan Zuckerberg Initiative 
are used to long odds, says Kastner. “The 
risk-taking is not an issue for them. They 
don’t want to be supporting a sure thing.” = 
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Ban on pathogen 
studies lifted 


United States allows work to make viruses more dangerous. 


BY SARA REARDON 


he US government has lifted its contro- 

| versial ban on funding experiments that 
make certain pathogens more deadly or 
transmissible. On 19 December, the National 
Institutes of Health (NIH) announced that 
scientists can once again use federal money to 
conduct ‘gain-of-functior research on patho- 
gens such as influenza viruses. But the agency 
also said that researchers’ grant applications 
will undergo greater scrutiny than in the past. 

The goal is to standardize “a rigorous pro- 
cess that we really want to be sure we're doing 
right’, says NIH director Francis Collins. 

The NIH announcement ends a morato- 
rium on gain-of-function research that began 
in October 2014. Back then, some research- 
ers argued that the agency’s ban — which 
singled out research on the viruses that cause 
flu, severe acute respiratory syndrome and 
Middle East respiratory syndrome (MERS) 
— was too broad. The 21 projects halted by 
the policy included studies of seasonal flu and 
efforts to develop vaccines. The NIH eventu- 
ally allowed ten of these studies to proceed, 
but three projects using the MERS virus and 
eight dealing with flu remained ineligible for 
US government grants — until now. 

While the ban was in effect, the NIH and 
other government agencies examined the costs 
and benefits of allowing such research. In 2016, 
the National Science 


Advisory Board for Gain-of- 
Biosecurity — an function studies 
independent panel “risked creating 
that advises the NIH’s an accidental 
parent agency, the pandemic ae 


US Department of 

Health and Human Services (HHS) — con- 
cluded that very few government-funded gain- 
of-function experiments posed a significant 
threat to public health. 

The new policy outlines a framework that 
the HHS will use to assess proposed research 
that would create pathogens with pandemic 
potential. Such work might involve modifying 
a virus to infect more species, or recreating a 
pathogen that has been eradicated in the wild, 
such as smallpox. There are some exceptions, 
however: vaccine development and epidemio- 
logical surveillance do not automatically 
trigger the HHS review. 


Influenza viruses can be modified in the lab. 


The plan includes a list of suggested factors 
for the HHS to consider, including an assess- 
ment of a project’s risks and benefits, and a 
determination of whether the investigator and 
institution are capable of conducting the work 
safely. It also says that an experiment should 
proceed only if there is no safer alternative 
method of achieving the same results. 

At the end of the assessment process, the 
HHS can recommend that the work go ahead, 
ask the researchers to modify their plan or sug- 
gest that the NIH refuse funding. The NIH will 
also judge the proposals scientific merit before 
deciding whether to award grant funding. 

Scientists have long debated the merits 
of gain-of-function research and the new 
decision could reopen that discussion. 

Yoshihiro Kawaoka, a virologist at the Uni- 
versity of Wisconsin—Madison, whose work 
was affected by the moratorium, says the 
new framework is “an important accomplish- 
ment”. Kawaoka, who studies how molecular 
changes in the avian flu virus could make it 
easier for birds to pass the infection to humans, 
now plans to apply for federal funding to 
experiment with live versions of the virus. 

But Marc Lipsitch, an epidemiologist at the 
Harvard T.H. Chan School of Public Health 
in Boston, Massachusetts, says that gain-of- 
function studies “have done almost nothing 
to improve our preparedness for pandem- 
ics — yet they risked creating an accidental 
pandemic”. 

Lipsitch argues that such experiments 
should not happen at all. But if the government 
is going to fund them, he says, it is good that 
there will be an extra level of review. m 
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Ancient-genome studies could help to explain migration patterns in the Americas and genetic diversity among Native Americans. 


What to look out for in 2018 


Moon missions, ancient genomes and a publishing showdown are set to shape the year. 


COSMIC DATA 

Fast radio bursts could become much less 
mysterious when the Canadian Hydrogen 
Intensity Mapping Experiment (CHIME) 
begins full operations this year. Astronomers 
hope to use CHIME to observe tens of these 
phenomena every day, boosting the current tally 
of just a few dozen in total. In April, astrono- 
mers will pounce on the second data set from 
the European Space Agency’s Gaia mission, 
which will reveal the position and motion of 
more than one billion stars in the Milky Way. 
The data could help to improve our understand- 
ing of the spiral structure of the Galaxy. 


ANCIENT AMERICANS 

Results from a slew of ancient-genome studies 
expected in 2018 could help to explain how 
humans spread across the Americas. Scientists 
hope to narrow down estimates of when and 
how people expanded into the region begin- 
ning around 15,000 years ago, and to clarify the 
timing and routes of subsequent migrations. 
The work might also help to explain the genetic 
diversity seen in today’s Native American pop- 
ulations. 


SCIENTIFIC-UNIT REVAMP 
After decades of work, the redefinition of four 
units of measure should get the go ahead in late 
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2018. At the General Conference on Weights 
and Measures in November, delegates from 
58 countries will vote on adopting new defini- 
tions of the ampere, the kilogram, the kelvin and 
the mole. These will be based on exact values of 
fundamental constants, rather than on arbitrary 
or abstract definitions. If approved, the changes 
should take effect in May 2019. 


TO THE MOON AND BEYOND 

While NASA works on US President Donald 
Trump’s order to send astronauts back to the 
Moon, two other space agencies will attempt to 
land rovers on the lunar surface. In early 2018, 
India’s Chandrayaan-2 will mark the country’s 
first attempt at a controlled landing in space. 
Then, in December, China’s Change-4 will 
become the first probe to target the far side 
of the Moon. Elsewhere in the Solar System, 
the Japan Aerospace Exploration Agency’s 
Hayabusa-2 should reach the primitive Ryugu 
asteroid by July, and NASAs Osiris-Rex is set to 
reach the asteroid Bennu in late 2018. Both will 
return samples to Earth in the 2020s. 


CANCER’S BIGGER PICTURE 

Insights into the genes that regulate cancer 
could emerge this year as scientists pore over 
the first large-scale multiple-cancer sequenc- 
ing effort of whole genomes. They will also get 


results from another large sequencing project, 
the Cancer Genome Atlas, which will release its 
analysis of the protein-coding regions — known 
as the exome — of 33 types of tumour. 


CLIMATE LANDMARKS 

Countries that have signed on to the 2015 Paris 
climate agreement will outline how much pro- 
gress they have made towards meeting their 
individual commitments to reduce greenhouse- 
gas emissions — all in the hope of holding the 
average global temperature to 1.5-2 °C above 
pre-industrial levels — as part ofa report called 
the Facilitative Dialogue 2018. The Intergov- 
ernmental Panel on Climate Change will also 
release a special report outlining the conse- 
quences of a 1.5-degree temperature increase. 
And in September, California Governor Jerry 
Brown will host a major climate conference in 
support of the Paris agreement. 


EXTREME IMAGING 

Expect a raft of studies on how matter evolves 
under extreme conditions, such as in a planet’s 
core. New tools at X-ray free-electron laser 
(XFEL) facilities worldwide will enable scientists 
to image samples changing under high tem- 
perature and pressure. Biological and chemical 
reactions could also become cheaper to study 
when the first tabletop XFEL facilities open, at 
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the German Electron Synchrotron 
near Hamburg and Arizona State 
University in Tempe. 


POWER PLAY 

Midterm elections are approaching 
in the United States. History sug- 
gests that whichever party controls 
the White House — in this case, the 
Republicans — is likely to lose seats 
in Congress. But it's not clear whether 
Democrats will be able to flip enough 
positions in the House of Representa- 
tives or the Senate to gain a majority 
in either chamber. Eyes will also be 
on the record number of scientists 
running for local, state and federal 
offices. Elsewhere, the United King- 
dom will enter phase two of Brexit 
negotiations to determine the nation’s scientific 
collaboration with the European Union after the 
country leaves the bloc in 2019. 


SPACE-INDUSTRY BATTLES 

Up to five teams competing for the US$30-mil- 
lion Google Lunar XPrize have until 31 March 
to land and manoeuvre the first privately funded 
rover on the Moon, then beam back images. 
And aerospace firms Boeing and SpaceX plan 
to launch their first crewed flights to the Inter- 
national Space Station for NASA by November. 


The X-ray free-electron laser (XFEL) facility near Hamburg, Germany. 


DISEASE TREATMENTS 

Efforts to bring gene-editing tools such as 
CRISPR-Cas$9 to the clinic are growing. The 
first phase I trial of CRISPR in people — edit- 
ing immune cells to tackle lung cancer — will 
end in April. Firms including Locus Bio- 
sciences in Research Triangle Park, North Car- 
olina, and Eligo Bioscience in Paris will work 
towards trials using engineered viruses called 
bacteriophages to harness the CRISPR system 
against antibiotic-resistant bacteria. And the 
first trial using induced pluripotent stem (iPS) 
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cells to treat Parkinson’s disease is 
set to begin in Kyoto, Japan, by the 
year’s end. 


PARTICLE SURFING 

It's crunch time for a new method 
of accelerating particles. Scientists 
with the AWAKE experiment at 
CERN, Europe's particle-physics 
lab near Geneva, Switzerland, have 
shown that the principle behind a 
proposal to accelerate electrons on a 
wave of plasma is sound. Now, they 
must actually do it. If successful, the 
technique could eventually lead to 
smaller and cheaper colliders. 


OPEN ACCESS 

Who will blink first in the stand-off 
between German scientists and publishing giant 
Elsevier? Around 200 German institutions will 
lose access to Elsevier journals from 1 January 
until the sides can reach an agreement in a long- 
running battle over subscription prices. Open- 
access advocates will also watch the fate of the 
website Sci-Hub — which provides unauthor- 
ized free access to millions of paywalled papers 
— after a US court order in November shut 
down some of its domains. m 
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Elizabeth Allen keeps careful records of the many treatments she has undergone to relieve the symptoms of chronic fatigue syndrome. 


The invisible disability 


Research into chronic fatigue 
syndrome has a rocky past. 
Now scientists may finally be 
finding their footing. 


BY AMY MAXMEN 
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ame a remedy, and chances are that Elizabeth Allen has 
tried it: acupuncture, antibiotics, antivirals, Chinese herbs, 
cognitive behavioural therapy and at least two dozen more. 
She hates dabbling in so many treatments, but does so 
because she longs for the healthy days of her past. The 34-year-old 
lawyer was a competitive swimmer at an Ivy-league university when 
she first fell ill with chronic fatigue syndrome, 14 years ago. Her metic- 
ulous records demonstrate that this elusive malady is much worse than 
ordinary exhaustion. “Last year, I went to 117 doctor appointments 
and I paid $18,000 in out-of-pocket expenses,” she says. 
Dumbfounded that physicians knew so little about chronic fatigue 
syndrome — also known as myalgic encephalomyelitis or ME/CFS — 
Allen resolved several years ago to take part in any study that would 
have her. In 2017, she got her chance: she entered a study assessing 
how women with ME/CFS respond to synthetic hormones. 
After decades of pleading, people with the condition have finally 
caught the attention of mainstream science — and dozens of 
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exploratory studies are now under way. Scientists entering the field 
are using the powerful tools of modern molecular biology to search 
for any genes, proteins, cells and possible infectious agents involved. 
They hope the work will yield a laboratory test to diagnose ME/CFS 
— which might have several different causes and manifestations — 
and they want to identify molecular pathways to target with drugs. 

The US National Institutes of Health (NIH) in Bethesda, Maryland, 
bolstered the field last year by more than doubling spending for 
research into the condition, from around US$6 million in 2016 to 
$15 million in 2017. Included in that amount are funds for four 
ME/CFS research hubs in the United States 
that will between them receive $36 million 
over the next five years. 

The stakes are high because the field’s 
scientific reputation has been marred by 
controversial research. A 2009 report’ that a 
retrovirus called XMRV could underlie the 
disease was greeted with fanfare only to be 
retracted two years later. And in 2011 and 
2013, a British team reported that exercise 
and cognitive behavioural therapy relieved 
the symptoms of ME/CFS for many people in 
a large clinical study called the PACE trial*”. 
US and UK health authorities had made rec- 
ommendations based on the findings, but, 
starting around 2015, scientists and patient 
advocates began publicly criticizing the trial 
for what they saw as flaws in its design. The 
organizers of the trial deny that there were 
serious problems with it, but health officials 
in both countries have nevertheless been revising their guidelines. 

Patients, meanwhile, are adrift in a vacuum of knowledge about 
the condition, says Jose Montoya, an infectious-disease specialist at 
Stanford Medical School in California and one of Allen’s physicians. 
“ME/CFS has suffered from scientists applying the usual approaches,” 
he says. He hopes that sophisticated analyses of genomics, proteom- 
ics, metabolomics and more will help to change that. “It wasn't until 
the microscope became available that an Italian microbiologist could 
link cholera to the bacteria that caused it, he says. “In the same sense, 
we have not had the equivalent to the microscope until now.” 


EARLY DAYS 

In 1984 and 1985, an epidemic of persistent fatigue broke out in Lake 
Tahoe, Nevada. The US Centers for Disease Control and Preven- 
tion (CDC) tested people for Epstein-Barr virus, one cause of the 
fatigue-inducing illness called mononucleosis or glandular fever, 
but the results were inconclusive and the investigation was dropped. 
Around 1987, researchers coined the name chronic fatigue syndrome. 
But the media snidely called it ‘yuppie flu. Doctors often told people 
their symptoms were caused by neuroses and depression. 

Buta small fraction of clinicians listened closely to patients — who 
insisted that their debilitating exhaustion was not just in their minds. 
And whereas a little exercise might temporarily uplift someone with 
depression, individuals with ME/CFS would be bedridden for days 
after exertion. Some people also struggle with chronic impairment, 
some with intestinal disorders, and others completely lose the ability 
to walk. Anthony Komaroff, a physician-scientist at Harvard Medical 
School in Boston, Massachusetts, began conducting studies on the 
disease in the mid-1980s despite being discouraged by his colleagues. 
“I was emboldened by the fact that when I asked my colleagues why 
they were sceptical, they could not articulate a reason,’ he says. 

In the 1990s, Leonard Jason, a psychology researcher at DePaul 
University in Chicago, Illinois, started questioning basic epidemio- 
logical information on ME/CES. For one thing, the CDC described 
the syndrome as rare and predominantly affecting white women. But 
Jason reasoned that clinicians could be missing many cases. Those 
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who were diagnosed were the ones most likely to return for a second, 
third or fourth medical opinion. And people who felt stigmatized, 
were confined to bed, were poor or had little social support might 
not go to such lengths to get a diagnosis. 

So, Jason’s team called almost 30,000 random Chicago phone 
numbers to ask whether someone in the household had symptoms 
of the disorder. If they did, the team brought them into clinics for 
evaluation. Asa result of the findings from this‘ and other studies, the 
CDC removed the word ‘rare’ from its description of the syndrome. In 
2015, a report’ from the US Institute of Medicine (IOM) estimated that 
836,000 to 2.5 million Americans have the 
disorder. Another study’ estimated that more 
than 125,000 people in the United Kingdom 
are living with ME/CFS. And a report’ from 
Nigeria suggests that the prevalence of the 
disease might be even higher there, perhaps 
exacerbated by other infectious diseases and 
poor nutrition. But these tallies are fraught, 
owing to the different ways in which doctors 
diagnose the condition. 

In many ways, people with ME/CFS remain 
invisible. Most have been dismissed by at least 
one physician. And society often ignores 
them, too. In the United States, financial 
pressures are common because health insur- 
ers might consider experimental treatments 
unnecessary, and employers might not feel 
that disability payments are justified. Even in 
countries where health care is a right, the situ- 
ation has been dire. Many patient advocates 
say that UK government agencies have essentially treated ME/CFS as 
if it were a strictly psychological condition, a conclusion that they argue 
was bolstered by the PACE trial’s findings that exercise and cognitive 
behavioural therapy relieve symptoms. The National Health Service 
(NHS) recommended these interventions, even after many patients 
complained that exercise dramatically worsens their condition. 

Epidemiologists have suggested’* that the anguish of contending 
with the disorder and society’s general dismissal of it contribute to an 
up to sevenfold increase in the rate of suicide for people with ME/CFS. 

Montoya will never forget one such tragedy. A decade ago, he 
opened an ME/CFS clinic for half'a day each week at Stanford. One 
afternoon, he received a call from a crying woman whose 45-year-old 
daughter had returned home to California after falling ill with ME/ 
CFS. The daughter had read about Montoya’ clinic online and wanted 
an appointment, but Montoya was booked for a couple of years. In her 
suicide note, he says, the daughter asked that her brain be donated 
to him for research. “I feel so guilty, since those were the years with 
hundreds of patients on the waiting list,’ he says. 


IMMUNE SYSTEM 

Today, Montoya’s clinic is open five days a week. And in his research, 
he's exploring several avenues. The hormone study in which Allen is 
participating is looking for changes in how the endocrine system is 
regulated among people with ME/CFS, a factor that might explain 
why the disorder is more common in women than in men. But 
Montoya’s leading hypothesis is that ME/CFS begins with an infection 
that throws the immune system out of whack. 

Infections generally lead to inflammation when protein receptors 
on T cells, a kind of immune cell, recognize corresponding proteins 
carried by bacteria, parasites or viruses. The T cells multiply and 
catalyse an inflammatory attack that includes the replication of anti- 
body-producing immune cells, called B cells. In the past few years, 
researchers have revealed hints of an unusual immune response 
in ME/CFS. Most recently, last June, Montoya and his colleagues 
revealed’ abnormalities in the levels of 17 immune-system proteins 
called cytokines in people with severe cases of the syndrome. What 
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disrupts the inflammatory response, however, remains unknown. 
One possibility is that, as in some autoimmune disorders, T cells mis- 
takenly become alarmed by one of the body’s own proteins, rather 
than by an invader, and B cells secrete self-reactive antibodies. 

An accidental finding has lent support to this idea. In 2008, Oystein 
Fluge, an oncologist at Haukeland University Hospital in Bergen, 
Norway, treated a lymphoma patient with rituximab, an antibody 
therapy that kills B cells. The patient told him that the drug resolved 
their ME/CFS. Fluge and his colleagues then conducted a placebo- 
controlled trial with 30 people who had the condition (and not 
cancer), and found that rituximab improved their symptoms’®. As 
word spread, Fluge was flooded with hundreds of e-mails from people 
asking to take part in his trials, and doctors around the world fielded 
desperate requests for the experimental therapy. 

Yet any hopes that Fluge dared to have were dashed last October, as 
he assessed data from an as-yet unpublished 151-person clinical trial 
and found that rituximab proved no better than the placebo. Fluge 
says the finer details of the trial might yet reveal whether a small sub- 
set of participants benefited. Like many others, he suspects that ME/ 
CFS might turn out to be several diseases, with different causes and 
underlying mechanisms. Therefore, what helps some people might 
not help others. This effect might not be discernible until research- 
ers can tease out how patients differ from one another. Still, the trial’s 
overall failure suggests that autoimmunity is not the main cause of ME/ 
CFS, says Derya Unutmaz, an immunologist at the Jackson Laboratory 
for Genomic Medicine in Farmington, Connecticut. Rather, he specu- 
lates that inflammation seen in ME/CFS might result from a problem 
on the regulatory side of a person’s immune system, which normally 
reins in the T-cell response to innocuous viruses, mould particles or 
other non-threatening stimuli. “Rituximab’s failure is very disappoint- 
ing for patients, but the fact that such a trial was done is a very impor- 
tant thing in the field,” Unutmaz adds. “By 
ruling this out, we can focus on other direc- 
tions.” This is the kind of scientific response 
that patient advocates have been fighting for 
since the 1990s. 


METABOLIC SYSTEM AND MICROBIOME 
Newsletters dating back decades document 
how activists have struggled to be recognized 
by scientists. In one column from 1998, the 
co-founder of an ME/CFS organization 
reports on a conference on the ailment in 
Boston. She notes that someone from ACT 
UP, a group known for driving research on 
HIV, was in attendance, “and may show us 
how to get more attention for the disease”. 
Through the 2000s, advocates accused the NIH of favouring grant 
proposals focused on psychiatric and behavioural studies, as opposed 
to those exploring physiological pathways. A sea change occurred in 
2015, however, with the IOM’s review’ of more than 9,000 scientific 
articles. “The primary message of this report; concluded the IOM, “is 
that ME/CFS is a serious, chronic, complex and systemic disease.’ Soon 
afterwards, NIH director Francis Collins said that the agency would 
support basic science to work out the mechanisms of the syndrome. 
In September last year, the NIH announced the winners of new 
grants in support of research hubs looking into ME/CFS. Some 
of the projects sound as if they duplicate each other, but that’s by 
design. Walter Koroshetz, head of the NIH’s National Institute of 
Neurological Disorders and Stroke in Bethesda and chair of the 
Trans-NIH ME/CFS Working Group, explains that the NIH sees 
strength in replication. “There has not been a coordinated effort 
to follow up on publications and to figure out which findings are 
most important, which can be reproduced and which fall away when 
you look at a different patient population,’ he says. For this reason, 
one of the NIH grants goes towards a centre at Research Triangle 
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Institute in North Carolina that will merge ME/CFS data. 

A $10-million, 5-year grant is also going to Unutmaz, who is 
studying the interplay between the immunological, metabolic and 
nervous systems of people with ME/CFS. As part of this, he will col- 
laborate with microbiologists to assess the bacteria living in patients’ 
bodies, and to see how shifts in those populations alter metabolites, 
such as glucose, that may in turn affect inflammation. Unutmaz 
admits that his studies are at an early stage, and says the point is to 
generate data to form sharper hypotheses. “We don’t know what 
we dont know in this disease,” he says. Researchers at Columbia 
University in New York City and Cornell University in Ithaca, New 
York, have won NIH grants to explore some of the same themes, and 
to delve into inflammation in the brain. 

Some CFS researchers argue that the NIH’s contribution remains 
too lean. “A real problem is that funders want to see papers coming 
out in a short time period, but this is a complex disease that requires 
long-term studies that are expensive to conduct,’ says Eleanor Riley, 
an immunologist at the University of Edinburgh, UK. Beginning in 
2013, Riley helped to launch and maintain an NIH-supported biobank 
of ME/CFS samples at the London School of Hygiene and Tropical 
Medicine. But the bank has been limited by funding constraints. 

Ronald Davis, a biochemist who directs Stanford’s Genome 
Technology Center, says that he too struggles to fund his lab’s work on 
ME/CES. He points out that although HIV affects roughly the same 
number of people in the United States — about 1.2 million — it received 
200 times as much funding from the NIH as ME/CFS did in 2017. 

In December, the Open Medicine Foundation in Agoura Hills, 
California, a research charity that Davis advises, announced its sup- 
port for an ME/CFS collaborative centre led by him. In one project, 
the team intends to finish analysing the complete genomes of 20 peo- 
ple severely ill with ME/CFS, along with the genomes of their family 
members, to look for a genetic predisposi- 
tion to the disease. Another project involves 
the development of what could be the first 
diagnostic test for ME/CFS. 

That test uses a small device containing 
2,500 electrodes that measure electrical 
resistance in immune cells and plasma from 
blood. When Davis exposed blood samples 
from people with ME/CFS to a stressor — a 
splash of salt — the chip revealed that the 
blood did not recover as well as samples 
from healthy adults. Davis is holding out on 
pronouncements, however, until he has con- 
ducted a study large enough to show clear 
and statistically significant effects — includ- 
ing a difference between people with ME/ 
CFS and those with other conditions. “With XMRYV, the problem was 
that people jumped to conclusions,” Davis says. “I’ve learned that if 
it’s exciting, it’s probably wrong.” 

Davis knows the pain of disappointment personally. He started 
studying ME/CES in 2008, when his son, Whitney Dafoe, became 
incapacitated by the disease. Dafoe volunteered to be studied at 
his father’s centre. A member of the team, Laurel Crosby, recalls 
exchanging e-mails with Dafoe, discussing the research. But as 
Dafoe’s condition got worse, he stopped replying in sentences, and 
began answering text messages with just a “Y’ or an ‘N’ Then those, 
too, stopped coming. Dafoe, now 34 years old, can no longer speak. 
He communicates with his parents through small motions, such as 
ripping holes in the shape of hearts in paper towels. 

A poster of Dafoe hangs in his father’s office. In it, he is standing 
on a beach in northern California with his arms raised towards the 
sky. Davis took the photo on one of the last days his son could walk. 
“Now he cannot talk, he can't listen to music, he can't write, he lays in 
bed all day, and there are thousands of patients like this, patients who 
are embarrassed to be told that nothing is wrong with them,’ Davis 
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Researcher Ronald Davis prepares a treatment for his son, Whitney Dafoe, who has chronic fatigue syndrome and can no longer walk or speak. 


says. So he is furiously testing the electrical device, as well as screening 
blood samples for proteins and genetic signatures that might reveal a 
biomarker for the disease. Not having clear criteria for a diagnosis has 
made clinical trials particularly challenging. 

In 2015, David Tuller, a journalist turned ME/CFS advocate, 
published a critique of the PACE studies". Weeks later, six researchers 
signed an open letter to the editor of The Lancet, which published the 
initial PACE results, requesting a reanalysis of the data (see go.nature. 
com/2z9inlg). Last March, scientists and advocates did the same in a 
letter to Psychological Medicine — the journal that published the 2013 
PACE results — requesting a retraction (see go.nature.com/2brb5yx). 
A leading criticism was that the investigators had changed how they 
measured recovery during the course of the trial, making that outcome 
simpler to achieve. The PACE investigators have denied this charge 
and others on their website, writing that changes were made before 
they analysed the data, and wouldn't have affected the results. 

Patients and advocates disagree, and although the paper has not 
been retracted, the CDC subsequently abandoned the trial’s rec- 
ommendations. In September last year, the NHS announced that it 
would also revise its recommendations. In a corresponding report”, 
a panel concluded that recent biological models based on measurable 
physiological abnormalities require greater consideration. 

Despite the setbacks and the long delays, many argue that science 
is operating as it should — being self-critical and open to revision. In 
five years time, researchers should be able to pinpoint specific aber- 
rations in the immune, metabolic, endocrine or nervous systems of 
people with ME/CFS, and perhaps find genetic predispositions to 


the condition. These indicators might yield diagnostic tests — and, 
further down the road, treatments. 

Allen did not enrol in Montoya’s study with the expectation of a 
cure around the corner. She says she'll be happy if — at the very least 
— a younger generation can avoid the complete bewilderment she felt 
when her body suddenly failed her. “I know how long science takes,” 
says Allen. “I am going to try and do whatever I can do to make it 
move forward as fast as possible.” m 


Amy Maxmen writes for Nature from San Francisco, California. 
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Cori Bargmann heads the Chan Zuckerberg Science Initiative, a philanthropic effort launched in late 2016 to support biomedical research. 


Three ways to 
accelerate science 


Chan Zuckerberg Science will prioritize the elements that made roundworm studies soar 
— creativity, openness and shareable resources, writes its president, Cori Bargmann. 


at the Massachusetts Institute of Tech- 
nology in Cambridge as a postdoctoral 
fellow. I was fascinated by the idea of using 
genetics to probe the neural basis of behav- 
iour. And a unique resource drew me to 
the tiny transparent worm Caenorhabditis 
elegans: a wiring diagram of the 302 neurons 
in the adult worm’s nervous system. 
Work led by John White, then a 


E 1987, I joined the lab of Robert Horvitz 


C. elegans researcher at the Medical 
Research Council’s Laboratory of Molecu- 
lar Biology (LMB) in Cambridge, UK, had 
mapped all the connections between the 
worms neurons by slicing the animal into 
thousands of sections and tracing each 
cell using electron microscopy. This wir- 
ing diagram, combined with the worm’s 
short life cycle of a few days, offered a 
tremendous opportunity to relate the 


development and function of the nervous 
system to genes and neurons. And it was 
just one of the many shared resources avail- 
able for C. elegans research. 

The findings made using C. elegans 
have been remarkable. Among these are 
the caspase system that controls pro- 
grammed cell death; the netrin system 
that guides neuronal connectivity; and the 
post-transcriptional gene-regulatory 
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>» pathways involving microRNAs and 
small interfering RNAs. 

I believe that the success of these projects 
emerged in part from a unique research cul- 
ture and infrastructure. Now I want to help 
put in place similar opportunities on a larger 
scale, as president of the Chan Zuckerberg 
Science Initiative, a philanthropic effort 
launched in late 2016 to support biomedical 
research. 


THE THREE INGREDIENTS 
What made the C. elegans field successful? 

A common reference. By the mid- 
1960s, fruit flies and yeast had already 
been studied for decades. But biologist 
Sydney Brenner, then at the LMB, wanted 
to develop a new model organism for stud- 
ying the big questions in development and 
neuroscience. He picked C. elegans. 

The LMB group began realizing 
Brenner’s goal by developing a shared 
infrastructure. Brenner and his PhD stu- 
dent Jonathan Hodgkin created genetic 
tools, such as strains of worms with well- 
characterized mutations, and mapped the 
functions of hundreds of genes. Biologist 
John Sulston led a team that described the 
complete lineage of all cells, documenting 
every step in the transformation ofa single- 
cell embryo to the adult worm (J. E. Sulston 
et al. Dev. Biol. 100, 64-119; 1983). White, 
Brenner and their team mapped the con- 
nections of all of the worm’s neurons, nam- 
ing every neuronal cell and mapping its 
lineage and place in the circuit. 

Descriptive science — observing, record- 
ing, describing and classifying phenomena 
— is often valued less than hypothesis test- 
ing. But the common resources that result 
help everyone. Every experiment I have 
done has been grounded in White and col- 
leagues’ wiring paper, affectionately known 
as The Mind of a Worm (J. G. White et al. 
Phil. Trans. R. Soc. Lond. B 314, 1-340; 
1986). 

The success of these projects, and the 
recognition of their value by the commu- 
nity, meant that it was easy to convince 
C. elegans researchers of the worth of the 
first genome projects discussed in the 
1990s. They were similarly game for mak- 
ing and sharing the first RNAi libraries 
(collections of small interfering RNAs for 
disrupting gene function, matched to every 
gene in the worm’s genome), the Worm- 
base organismal database (a repository of 
everything that’s known about C. elegans 
biology) and, more recently, the global 
genetic-diversity resource CeNDR (www. 
elegansvariation.org). 

Creative exploration. Today, people are 
often encouraged to stay in a research niche 
for long stretches of their careers — to learn 
‘more and more about less and less. One 
effect of this is that students stay in the same 
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fields as their advisers, and both learn less 
than they might have done had they diversi- 
fied. 

By contrast, the MRC mavens took a 
gamble that there were many interesting 
questions left in biology, and that buying 
lots of lottery tickets — in the form of dif- 
ferent research areas — would pay off for 
the success and prestige of the field. Thus, 
there was a conscious decision among 
those involved in the foundational work 
on C. elegans to maximize discovery by 
encouraging people to explore the worm’s 
biology widely. When I joined his lab, 
Horvitz told me I could study any problem 
that could be addressed in a worm. 

Openness. Today, two concerns tend 
to come up in discussions about releasing 
findings before their formal publication: 
is the work accurate, and will people steal 
the results? 

When I started working on C. elegans, 
people published in a semiregular news- 
letter called the Worm Breeder’s Gazette 
(WBG). Most of the groups that were using 
the worm as a model 


organism published “We wantall 
in every issue; the of biomedical 
one-page abstracts science to 
typically described be faster, 

a single result. The morerobust, 
WBG was fast. Afew sharable and 


weeks or months 
after you had a result, 
it would be out there for everyone to see. In 
fact, some WBG abstracts preceded papers 
by five years or more. 

Some of the findings reported in the 
WBG didn't hold up long-term. And that 
was okay; results that can’t be replicated 
soon get ignored. As for stealing others’ 
work, I think that the very openness of 
the C. elegans field acted as a deterrent. 


scalable.” 


The roundworm Caenorhabditis elegans. 


Everyone knew what was in the WBG, and 
there was a clear expectation that if you 
used someone else’s result, you included 
that person in your study or cited them. 
The scientists who read the WBG were the 
same ones who were going to review your 
grants, papers and case for promotion, so 
the implicit requirement to respect that 
culture had teeth. In many cases, the open- 
ness seemed to relieve tensions; people 
could find out in advance whether similar 
work was in progress in another lab, and 
coordinate publications. 


SHAPING SCIENCE TODAY 

The mission of the Chan Zuckerberg Science 
Initiative, founded in 2016 by Mark Zucker- 
berg and Priscilla Chan, is to support science 
and technology that will make it possible 
to cure, prevent or manage all diseases by 
the end of the century. It’s a bold goal. But 
the end of the century is still 82 years away. 
Going back in time a similar distance, much 
of modern medicine would have been 
unthinkable — from organ transplants and 
deep brain stimulation to treating cancer by 
manipulating the immune system. 

All of these advances were built on a 
foundation of basic biomedical science. 
To enable the next generation of discover- 
ies, we at the Chan Zuckerberg Initiative 
want all of biomedical science to be faster, 
more robust, sharable and scalable. We're 
starting a number of different programmes 
— both locally and globally — to try out 
ideas for accelerating science and driving 
collaboration. 

First, we want to support scientific infra- 
structure projects that change the land- 
scape for research fields. In collaboration 
with other groups and funders, we are sup- 
porting the Human Cell Atlas (HCA), an 
endeavour to map all the cells in the human 
body. For the trillions of cells that make up 
the human body, we don’t know how many 
cell types there are, nor their exact num- 
bers, locations, molecular compositions 
and spatial relationships in tissues and 
organs. Such knowledge could benefit all 
biologists who study humans. 

In addition to funding experimental 
scientists working on the HCA, the Chan 
Zuckerberg Initiative is funding external 
collaborators and an in-house group of 
software engineers and computational 
biologists focused on developing new data 
platforms and tools for biomedical science. 
This is an opportunity, because many of the 
advances in technology that have happened 
in the commercial sector have not been 
available to academic science. As a neuro- 
scientist, I take this personally: numerous 
recent innovations in machine learning and 
neural networks originated in neurosci- 
ence, so biologists should be able to share 
the benefits. 
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Second, to foster creativity, we plan 
to support people who want to work in 
new areas — especially young researchers 
setting up their own labs. Most scientists 
do their most creative work at this early 
stage of their careers. But — understand- 
ably — it’s often hard to obtain funding 
unless you can demonstrate expertise in 
a particular area. The Chan Zuckerberg 
Initiative could fill a niche by taking on 
more risks than other funders. That risk 
is worthwhile if it brings people into bio- 
medical areas in which the need is great 
but current research is narrowly directed. 
Unfortunately, disease-relevant fields can 
be some of the hardest to break into for 
someone with a new idea or approach. 
Certain disease foundations, such as the 
Hereditary Disease Foundation for Hun- 
tington’s disease or the Simons Founda- 
tion Autism Research Initiative, have 
done this well in the past. But we think 
that there is room to scale up this model 
to many other biomedical problems. 

Finally, on openness. We believe that 
research advances when people build 
on each others’ work. So our princi- 
ples include making data, protocols, 
reagents and code freely available for 
other scientists to use. As an example 
of this approach, the HCA has com- 
mitted to making its reference data 
publicly available after quality-control 
checks. Indeed, the Chan Zuckerberg 
Initiative engineering team and our 
HCA collaborators are building all of 
the software for the ‘data coordination 
arm of the project on the open-source 
platform Github. 

Were also supporting external groups 
that share these values and goals. For 
instance, we're funding bioRxiv, the larg- 
est and fastest-growing preprint reposi- 
tory for the biological sciences — and a 
leader in bringing biology towards the 
level of sharing that’s expected in the 
physical and computer sciences. 

The Chan Zuckerberg Initiative is 
just starting, and we have a lot to learn. 
But I’ve been lucky to work in areas in 
which the free exchange of ideas and 
results is the norm. In my experience, 
such an approach creates the most 
dynamic fields. Now I have the chance 
to lead a new funding venture, and to 
explore whether openness or dynamism 
comes first. After all, as scientists we 
do experiments; as funders, we can do 
experiments too. 


Cori Bargmann is president of science 
at the Chan Zuckerberg Initiative in Palo 
Alto, California; and professor of genetics, 
neuroscience and behaviour at the 
Rockefeller University, New York, USA. 
e-mail: science@chanzuckerberg.com 
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COMMENT 


An enclosure for measuring gas exchange between plants and the atmosphere at a station in Finland. 


Build a global 
Earth observatory 


Markku Kulmala calls for continuous, comprehensive 
monitoring of interactions between the planet’s 
surface and atmosphere. 


( vec change. Water and food 
security. Urban air pollution. These 
environmental grand challenges are 

all linked, yet each is studied separately. 
Interactions between Earth’s surface and 

the atmosphere influence climate, air quality 
and water cycles. Changes in one affect the 
others. For example, increasing carbon diox- 
ide enhances photosynthesis. As they grow, 
plants withdraw greenhouse gases from the 
atmosphere, but they also release volatile 
organic compounds such as monoterpenes. 

These speed up the formation of aerosol par- 

ticles, which reflect sunlight back into space. 

Our actions — such as emission-control 

policies, urbanization and forestry — also 

affect the atmosphere, land and seas’. 
Satellites and stations on the ground track 
greenhouse gases, ecosystem responses, 
particulate matter or ozone independently 
of each other. Coupled observations are 
occasionally performed, but in intensive 
bouts. Vast areas of the globe — including 
Africa, eastern Eurasia and South America 
—are barely sampled. 


The result is a cacophony of information 
that yields little insight. It is like trying to 
forecast weather in November with spotty 
measurements of rain, wind, temperature or 
pressure from June. 

The answer is a global Earth observatory 
— 1,000 or more well-equipped ground 
stations around the world that track 
environments and key ecosystems fully 
and continuously. Data from these stations 
would be linked to data from satellite-based 
remote sensing, laboratory experiments and 
computer models. 

Researchers could find new mechanisms 
and feedback loops’ in this coherent data set. 
Policymakers could test policies and their 
impacts. Companies could develop envi- 
ronmental services. Early warnings could 
be provided for extreme weather, and quick 
responses initiated during and just after 
chemical accidents. 

A global observatory has been discussed 
for more than a decade, but is only now 
feasible’. Instruments have matured; for 
example, today’s mass spectrometers > 
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> can measure thousands of atmospheric 
chemicals at once. My team and our collabo- 
rators have shown how a rounded set of envi- 
ronmental measurements can be obtained 
at one station, called SMEAR II (Station for 
Measuring Ecosystem-Atmosphere Relation- 
ships), in the boreal forests of Finland. 

Regional initiatives to combine and 
broaden space- and ground-based monitoring 
are established well enough to roll out simi- 
lar stations globally. These include PEEX (the 
Pan Eurasian Experiment) and the DBAR 
(Digital Belt and Road), a research initiative 
related to China’s One Belt and One Road 
Initiative — a development strategy covering 
a swathe of 65 countries between China and 
Europe that reaches as far south as Kenya. The 
World Meteorological Organization (WMO) 
is taking steps to establish a global observa- 
tory. And the urgency is here: carbon emis- 
sions must decline after 2020 (ref. 8). 

The scale of the enterprise remains 
daunting. It requires a wholesale shift in 
how environmental data are collected and 
disseminated. 


AN INTEGRATED NETWORK 

Incomplete coverage from ground stations 
is the main limit to observations of Earth’s 
conditions. Satellites can continuously mon- 
itor some compounds, such as CO,, ozone 
and aerosols, almost planet-wide. But they 
cannot resolve processes or fluxes, or trace 
the hundreds more compounds of inter- 
est. Satellite data must be ‘ground-truthed. 
Models need data to validate them. 

Current networks of ground stations have 
been set up without considering the big pic- 
ture. Each discipline or team designs and 
builds stations to suit its purpose. Green- 
house gases, atmospheric chemicals and 
ecosystems are monitored at different sites. 
Funding agencies focus on national interests. 

The SMEAR II station takes a more 
integrated approach. Using state-of-the- 
art atmospheric mass spectrometers, cloud 
radars and lidars (light detection and rang- 
ing instruments), it observes more than 
1,000 variables. These include greenhouse 
gases, trace gases and aerosols, as well as 
indicators of photosynthesis, soil tempera- 
ture, moisture and nutrient gradients. 

The challenge is to set up similar stations 
around the world — and to incorporate 
local expertise. Good places to start would 
be the three global regions where coverage 
is sparse, and in megacities. 


HOT SPOTS 

The Arctic and boreal regions. Former 
Soviet Union countries, including Russia 
and Kazakhstan, are crucial laboratories for 
global change. They are rich with minerals, 
oil and natural gas: Siberia contains 85% of 
Russia's prospected gas reserves, 75% of its 
coal and 65% of its oil reserves. And climate 
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change is rapidly altering their environments. 
There is much we don't know. How rapidly 
will permafrost disappear? Does Arctic 
greening sequester carbon or produce aero- 
sols? Will methane emissions increase drasti- 
cally, and so ramp up global warming? 

In this region, as elsewhere, researchers 
need to observe aerosols together with 
greenhouse gases (such as CO, and methane) 
and other trace gases (volatile organic com- 
pounds, nitrogen oxides, ozone, sulfur diox- 
ide, carbon monoxide and ammonia). Two 
stations are starting to increase the range of 
observations that they can make: the Tiksi 
Hydrometeorological Observatory in the 
River Lena delta in eastern Russia and the 
Zotino Tall Tower Observatory (ZOTTO) 
in southwest Siberia, 500 kilometres from 
Tomsk. Ideally, to cover the region, around 
30 comprehensive stations will be needed, 
spaced 1,000 kilometres apart. A global 
observatory must appear on the agendas of 
upcoming meetings of the Russian govern- 
ment and the Arctic Council. 


Africa. The continent’s population is 
increasing fast — it has doubled since 1987, 
and it reached 1.2 billion people in 2015. 
Meanwhile, once-fertile areas have become 
dry, challenging water and food supplies and 
requiring strategies to store rainwater and 
retain soil moisture’. Water and other bio- 
geochemical cycles need to be understood 
better. But monitoring in Africa is limited 
mainly to short-term observations of carbon 
sinks and sources (by the global network 
FLUXNET) and to some air-quality obser- 
vations that measure about a dozen variables. 

A minimum of 30 stations should be built 
in Africa. These must comprise at least one 
in each main ecosystem that is relevant to 
food and water, including rainforests, savan- 
nahs and semi-deserts. Prime sites should 
be identified with local organizations and 
scientists. United Nations organizations, 
development banks and private foundations 
that work in Africa should add their support. 


South America. The Amazon basin is a 
crucial place to monitor, owing to its vast 
area and influence on global carbon and 
hydrological cycles. It forms its own climate 
system, which is changing” as a result of 
agricultural expansion and deforestation. 
These disturbances, together with climate 
shifts, will affect carbon storage and water 
cycles. Yet there is little information availa- 
ble, and no combined observations. Only the 
Amazon Tall Tower Observatory (ATTO), 
located about 150 kilometres northeast of 
Manaus, Brazil, is taking steps to increase 
the range and continuity of data obtained. 

South America needs at least 20 such sta- 
tions: 7 should be located in the Amazonas 
region. The exact sites need to be identified 
with local scientists and organizations. 


Four 
hot spots 


Setting up stations to monitor air, soil and 
ecosystems across Eurasia, Africa, South America 
and in major cities would fill crucial gaps in a 
global observatory network. 


African nations such as Somalia need 


better monitoring of water cycles to improve 
strategies that help to retain soil moisture. 


Cities. Urban areas are growing: the urban 
population has tripled since 1970. More than 
55% of the global population lives in urban 
areas. Better data on air quality is a particu- 
larly pressing need. Currently, fewer than 
15 variables are typically observed at sites in 
urban areas, and the data quality is often poor. 

More than 30 megacities worldwide each 
contain greater than 10 million people, and 
hundreds of cities have populations in the 
millions. Each large metropolis should have 
at least one comprehensive observatory and 
a suite of simpler local stations. The Global 
Mayors’ Forum should put the global obser- 
vatory on its agenda, as should the G20 
countries. 


COST EFFECTIVE 

A global observatory, comprising a network 
of 1,000 super stations, needs to be estab- 
lished within 10-15 years. Costs would 
be around €10 million (US$11.8 million) 
to €20 million per station, or €10 billion 
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PA Deforestation in the Amazon basin is 
changing its climate system. 


to €20 billion for the whole thing. This is 
comparable to the construction cost of the 
Large Hadron Collider near Geneva, Switz- 
erland, or that of US President Donald 
Trumps proposed Mexican wall. 

Stations should be constructed or 
upgraded using a modular approach. The 
different modules would target atmos- 
pheric chemistry, micrometeorology and 
soil chemistry, for example. Each block 
would cost around €500,000 to €2 million to 
develop and install. Annual servicing would 
add about 3-6% per year to these costs. 

The instruments will need to be 
harmonized, calibrated and standardized. 
They must be developed and upgraded as 
techniques improve. Data sharing must be 
considered — information must be reliable 
and open. Data scientists will be needed 
to analyse data and develop products that 
flow from the stations to users and archives. 
Professional staff will be needed to run the 
stations. 


Existing networks need to coordinate 
their practices. These include scientific pro- 
grammes such as PEEX, the DBAR initiative 
and FLUXNET; global organizations such as 
the WMO and Future Earth; private global 
foundations and companies; and municipal, 
governmental and UN bodies. 

Complementary infrastructures such 
as the following should be combined: the 
Integrated Carbon Observation System 
(ICOS); the WMO’s Global Atmosphere 
Watch; the Aerosols, Clouds, and Trace gases 
Research Infrastructure network (ACTRIS); 
Europe’s Long-term Ecosystem Research 
(LTER); and the infrastructure for Analy- 
sis and Experimentation on Ecosystems 
(AnaEE). The first step would be the open 
exchange of data between them, which is 
already starting to happen in Europe. Next, 
the networks should establish joint stations 
across other continents, especially in the hot 
spots mentioned. SMEAR I proves that this 
is feasible and need not be expensive. 


E Greenhouse-gas 
measurements 

in Siberia will 

help to reveal the 
effects of melting 
permafrost. 


LY Megacities 
such as Lagos 
need better data 
on air quality. 


Once we establish the global observatory, 
we will have the tools to understand how the 
Earth system works. = 


Markku Kulmala is a professor of physics 
and director of the Institute for Atmospheric 
and Earth System Research at the University 
of Helsinki, Finland; and head of the Aerosol 
and Haze Laboratory at the Beijing 
University of Chemical Technology, China. 
e-mail: markku.kulmala@helsinki.fi 
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Hot tickets for 2018 


NASA turns 60, Antarctic dinosaurs lumber into view, fruit waste transforms fashion 
and Madeleine LEngle’s sci-fi classic A Wrinkle in Time hits the screen. Around the 
world, museums and galleries will explore everything from our relationship with time 
to the brain’s beauty, Fatimid science and the wonder of graphene. Nicola Jones reports. 


Mad Minds 

Victor Hugo’s Houses, Paris 

Until 18 March 

In the nineteenth century, psychiatry 
was evolving. While patients in asylums 
such as London’s Bethlem Hospital 
(nicknamed Bedlam) suffered indignities 
and abuse, a new movement encouraged 
ethical treatments. Instead of chains 

and isolation, it encouraged freedom 

of movement and self-expression. 
Practitioners such as Scottish physician 
William Browne started to pay closer 
attention to the art and writing of people 
with mental illnesses. And psychiatrists 


24 | NATURE | VOL 553 


4 JANUARY 


became the first collectors and critics of 
these works, which some have seen as 
representing artistic drive at its rawest. 
This show includes pieces amassed 

by Browne, who pioneered ideas of art 
therapy at Crichton Royal Hospital in 
Dumfries, UK, alongside similar collections 
from Germany and Switzerland. 


Painted Surfaces 

Iziko South African National Gallery, Cape Town 
Until 1 April 2018 

A chance to peer beneath the surface 

of paintings by some of South Africa’s 
greatest artists, including Stanley Pinker, 


Irma Stern, Frederick l’Ons and George 
Pemba, awaits at this exhibition. The 
results of a three-year collaboration by 
institutions including the University of 
Cape Town and the University of the 
Western Cape, it explores the artists’ 
techniques and the histories of their works 
through infrared photography, ultraviolet 
light, X-ray imaging and microscopic 
analysis. Infrared images, for example, 
reveal that the tin support of l’Ons’s 
Krantzdrift: Landscape with Cattle was an 
enamel shop sign advertising Peek Frean 
biscuits, which helped to date the work to 
the late nineteenth century. 
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Wonder Materials — Graphene and Beyond 
Hong Kong Science Museum 
Until 18 April 


Graphene — sheets of carbon one-millionth 
the thickness of a human hair and 200 times 
as strong as steel — was known to exist 

from the 1940s, but wasn’t isolated until 
2004. That year, physicists Andre Geim and 
Konstantin Novoselov at the University of 
Manchester, UK, managed to separate a flake 
of graphene one atom thick from a lump of 
graphite using sticky tape (six years later, 
they won the Nobel Prize in Physics). This 
highly useful material is now making its mark 
in industry, with applications in everything 
from speciality batteries to tennis rackets. 
Along with its discovery and commercial 
applications, this exhibition focuses on the 
material’s possible future. 


Time Unwrapped 
King’s Place, London 
6 January - 31 December 


This year-long series of more than 

50 musical and spoken-word events 
explores humanity’s relationship 

with time. Kicking off with a talk on 
timekeeping by David Rooney, keeper 

of technologies and engineering at 
London’s Science Museum, the series 
meanders through an eclectic mix 

of lectures, Bach cantatas, jazz and 

folk concerts. Cosmologist Malcolm 
Longair and music critic Tom Service will 
ponder musical revolutions of the early 
twentieth century that paralleled Albert 
Einstein’s development of relative time. 
The line-up also includes experimental 
physicist Helen Gleeson, who produced 
the first graphene-based liquid-crystal 
device; a dramatic recreation of 

Douglas Adams’s novel Dirk Gently’s 
Holistic Detective Agency (William 
Heinemann, 1987) by actor Geoffrey 
McGivern; a “human clock” by Hang player 
and percussionist Manu Delago and others; 
and a collage by pianist Alasdair Beatson 
that melds music by Beethoven with the 
nocturnal sounds of insects. 


The Beautiful Brain: The Drawings of 
Santiago Ramon y Cajal 

Grey Art Gallery, New York City 

9 January - 31 March 

Spanish pathologist and Nobel laureate 
Santiago Ramon y Cajal was a founder of 
modern neuroscience and an accomplished 
artist. His dissections and drawings of the 
human brain in the late nineteenth century 
provided definitive evidence that the nervous 
system is made up of discrete cells, including 
neurons. Ramén y Cajal also discovered a 
new type of cell, later named after him, amid 
neurons in the gut. Some 80 of his drawings 
are in this touring show, which opened at 

the Weisman Art Museum in Minneapolis, 


Minnesota, and will move to the MIT Museum 
in Cambridge, Massachusetts, in May. 


The World of the Fatimids 

Aga Khan Museum, Toronto, Canada 

10 March - 2 July 

The educational, scientific and artistic legacy 
of the Fatimids, an Arab dynasty that ruled 
over swathes of North Africa in the tenth 
and eleventh centuries AD, features in this 
sumptuous show. In Cairo, the Fatimids 
founded one of the world’s oldest degree- 
granting educational institutions — Al-Azhar 
University — in 970, as well as one of 

the era’s greatest libraries. And their rule 
advanced science: the pioneer of optics 

Ibn al-Haytham, for example, lived in Cairo 
under the caliphate. The exhibition features 
marble reliefs from Cairo’s Museum of 
Islamic Art, masterpieces in metal, and 
ceramic lustreware, a Fatimid innovation. 
Drone videography and virtual-reality films 
provide a peek at what the Egyptian capital 
might have looked like a millennium ago. 


KING TUT: Treasures of the Golden 
Pharaoh 


California Science Center, Los Angeles 
24 March 2018 —January 2019 


The body of Tutankhamun, the child- 
pharaoh who ruled Egypt more than 
3,000 years ago, was discovered in 
1922 in the most complete royal tomb 
ever found in the region. As we near the 
centenary of that find, Tut’s belongings 

hit the road. The Egyptian Ministry of 

Antiquities is working with partners to 

present more than 150 artefacts from the 

tomb — the largest assembly of original 
objects ever displayed outside Egypt. 

(Previous tours, including 

the 1970s Treasures of 

Tutankhamun exhibition that 

drew more than 8 million 

visitors to its US sites alone, featured about 

50.) This year’s show includes a life-size 

wooden statue of Tutankhamun, a gilded 

ceremonial bed, a statue of the god Duamutef 

(pictured) and a jewelled coffinette that held 

the pharaoh’s liver. His famous death 

mask and mummified body remain in 

Egypt. The exhibition will move on to 

Europe after its Los Angeles premiere. 


Fashioned from Nature 
Victoria and Albert Museum, London 
21 April 2018-27 January 2019 


‘Fashion victim’ will gain a 
whole new meaning at this 
show. For centuries, nature 
has fallen prey to fashion 
frenzies. In the Victorian era, 
for example, birds’ body parts were used 

to make jewellery and trim hats; an 1875 

pair of earrings made from the taxidermied 
heads of honeycreepers (pictured) willbe > 
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INACINEMA 
NEAR YOU... 


Alongside the usual offerings of superhero 
sequels and Star Wars flicks, 2018 

brings a handful of hotly anticipated 
science-tinged films, from fantasy to 
nearly-non-fiction. 


Annihilation From Alex Garland, writer- 
director of Ex Machina (2015), comes 

the story of a biologist (Natalie Portman, 
pictured), an anthropologist, a psychologist 
and a surveyor on an expedition into 

Area X. What they find in this bizarre, alien- 
influenced environmental disaster zone is 
unexpected. US release: 23 February. 


AWrinklein Time This adaptation of 
Madeleine L'Engle’s classic 1963 sci-fi 
story, directed by Ava DuVernay, features 
a star-studded cast that includes Oprah 
Winfrey, Reese Witherspoon and Chris 
Pine. Learning that her astrophysicist 
father is held captive on a distant planet, 
youthful heroine Meg Murry works with 
family and a band of unusual friends to 
save him. US release: 9 March. 


Ready Player One Steven Spielberg 
directs the film of Ernest Cline’s 2011 novel. 
In a dystopian 2040s, people escape over- 
populated slums by living, studying and 
working in a virtual reality, the OASIS. When 
its quirky creator dies, he leaves behind a 
treasure hunt for his fortune — and a pack 
of teenagers aim to beat big business to the 
prize. US release: 30 March. 


First Man NASA's mission to land a 
man on the Moon gets the Hollywood 
treatment, with Ryan Gosling as Neil 
Armstrong. The rights to the book were 
bought in 2003, but filming — directed 
by Damien Chazelle, who worked with 
Gosling on 2016's La La Land — began 
only after Armstrong’s death in 2012. 
US release: 12 October. 
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Neuroscience pioneer and artist Santiago Ramon y Cajal is the subject of a touring exhibition. 


» ondisplay. The exhibition will chart the use 
of natural materials over 400 years, from silk, 
wool and cotton to whalebone and turtle 
shell. More environmentally friendly modern 
materials will feature, too: clothes crafted 
from recycled plastic bottles or the fibrous 
remains of juiced oranges; a dress grown 
from plant roots by artist Diana Scherer; 

and a leather substitute created from wine- 
industry grape waste. If that doesn’t wow you, 
there’s a gown of bioluminescent, genetically 
engineered silk. 


Teeth 
Wellcome Collection, London 
17 May - 19 September 


How did dentistry evolve from fairground 
entertainment in the early eighteenth century 
to today’s highly skilled profession? This 
exhibition traces the medical and scientific 
history of oral hygiene and dentistry, as well 
as their evolving association with beauty and 
wealth. It will draw on Wellcome Collection 
images, objects and artworks, which include 
documents on how William Shakespeare 
cleaned his teeth, mercury poisoning from 
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early fillings and the “means of correcting 
and purifying a tainted or unpleasant 

breath” in the nineteenth century. A terrifying 
scanning electron microscope image of a 
decayed tooth may also feature. 


NSO Pops: Space, the Next Frontier 
John F. Kennedy Center for the Performing Arts, 
Washington DC 

1 June-2 June 


On 29 July 1958, US president Dwight 
Eisenhower signed the act that gave birth 
to NASA. In celebration of the agency’s 
60th birthday, the US National Symphony 
Orchestra will play music to a backdrop of 
images from the Hubble Space Telescope, 
the International Space Station, 
the Curiosity rover on Mars, 
and famous sci-fi films and 
television programmes. 
Come to hear space- 
inspired musical 
selections, including 
favourites from Star 
Wars and Star 
Trek, and a new 


commission by composer Michael Giacchino 
(winner of a 2010 Academy Award for his 
score of the animated feature Up). 


Antarctic Dinosaurs 
Field Museum, Chicago, Illinois 
15 June 2018 - 6 January 2019 


Some 200 million years ago, dinosaurs 
roamed a lushly forested Antarctica, which 
was then part of a supercontinent that 
included what are now Africa and South 
America. In collaboration with the Natural 
History Museum of Utah in Salt Lake 

City and other institutions, this travelling 
exhibition spotlights the continent’s 
Mesozoic landscape, as well as current 
logistical challenges of doing science in 

a harsh climate. Dozens of fossils and 
specimens will be on display, spanning 
modern plants to extinct animals that 

lived on the vast landmass before the 
dinosaurs. The show includes remains and 
replicas of the first and largest Antarctic 
dinosaurs discovered: the 7-metre-long 
Cryolophosaurus; Glacialisaurus; and two 
juvenile prosauropods. After opening at the 
Field Museum, it will travel to California, Utah 
and elsewhere. 


Catastrophe and the Power of Art 
Mori Art Museum, Tokyo 
6 October 2018 - 20 January 2019 


The human ability to bounce back following 
disasters — whether the global financial 
crisis of 2008 or the earthquake and 
tsunami that hit Japan in 2011 — is the 
focus of this show. The collection spans 
both personal responses to catastrophe 
and examinations of wider associated 
social problems, such as the dream of 
unrestricted economic growth and the 
hubris of humanity’s urge to control nature. 
The exhibits will include works by Japanese 
photographer Naoya Hatakeyama and 

New York-based hacktivists Eva and Franco 
Mattes. 


Audubon’s Birds of America 
New-York Historical Society Museum and Library 
Ongoing 
In 1820, US naturalist John James Audubon 
declared his intention of depicting every bird 
in North America. Arranging specimens in 
lifelike poses using wires and threads, he 
painted them in watercolour and life-size. 
His masterpiece The Birds of America 
(1827-38) contains 435 illustrations 
(pictured, the wild turkey, Meleagris 
gallopavo) and introduced 25 new 
species; it deeply influenced naturalists 
such as Charles Darwin, who referred 
to Audubon’s work in his 1859 On the 
Origin of Species. This exhibition features 
all the original paintings — also available 
online (See go.nature.com/2c7i3i1) — 
alongside plates used for the book. m 


SANTIAGO RAMON Y CAJAL, SELF PORTRAIT, C1885 


JOHN JAMES AUDUBON 
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Brexit must protect 
European science 


As president of ALLEA (All 
European Academies), I am 
disappointed by the slow pace 
of Brexit negotiations and lack 
of awareness of their impact 

on European science. Research 
in the European Union needs 
collaboration with the United 
Kingdom and vice versa. I agree 
that greater clarity on the future 
relationship between EU and UK 
science policy is paramount (see 
V. Ramakrishnan Nature 551, 
543; 2017). 

Much has been said about the 
implications of Brexit for UK 
science, with little mention of the 
detrimental effects on European 
science as a whole. EU research 
depends more than ever on 
international collaboration 
and competitiveness. The 
UK scientific system contributes 
much of this and is a core 
strength. Collaboration with 
EU partners now makes up more 
than 30% of all UK scientific 
publications. 

As Brexit negotiations 
move into the second phase, 
we must take steps to ensure 
UK participation in European 
science and in the design and 
implementation of the next 
EU research and innovation 
Framework programme. 
Severing our close collaborations 
would have irreversible long- 
term effects on the quality of 
scientific research and would 
touch the lives of all citizens 
of the EU and Britain. In the 
current negotiations, the sole 
certainty seems to be that we 
urgently need a more ambitious 
deal for moving European 
science beyond Brexit. 

Giinter Stock ALLEA, Berlin, 
Germany. 
president-allea@bbaw.de 


Climate engineering 
includes land and sea 
Stephen Andersen calls 


for governance of climate 
engineering, suggesting that 


the Montreal Protocol could 
take on full responsibility for 
the task (Nature 551, 415; 
2017). However, the protocol’s 
assessment experts focus solely 
on stratospheric processes, and 
in our view would be unlikely 
to be able to take on regulation 
of the full range of ambitious 
geoengineering projects. 

The range of proposed 
techniques includes land- 
and ocean-based removal 
of greenhouse gases from 
the atmosphere, as well as 
increasing the amount of 
sunlight reflected from land 
and ocean surfaces. All these 
methods are considered as 
geoengineering by such bodies 
as the Intergovernmental 
Panel on Climate Change and 
the Convention on Biological 
Diversity. 

The London Protocol on 
marine geoengineering already 
has draft regulations in hand 
for techniques such as ocean 
fertilization (see go.nature. 
com/2ow7ikp). These aim to 
protect the marine environment 
and human health, with input 
from the Joint Group of Experts 
on the Scientific Aspects 
of Marine Environmental 
Protection (GESAMP; see 
go.nature.com/2bksdnn). 
Chris Vivian Burnham-on- 
Crouch, UK. 

Phillip Williamson University of 
East Anglia, UK. 

Philip Boyd University of 
Tasmania, Australia. 
chris.vivian2@btinternet.com 


Boost soil carbon for 
food and climate 


The “4 per 1,000’ initiative 

was launched by the French 
government at the COP21 Paris 
climate summit in 2015. It 

aims to boost carbon storage in 
agricultural soils by 0.4% each 
year to help mitigate climate 
change and increase food 
security (www.4p1000.org). 
Despite the global importance 
of these societal imperatives, 
soil-carbon sequestration is still 


not on the political agenda, and 
was not formally discussed at 
the Bonn COP23 meeting in 
Germany in November 2017. 
Crucially, the 4 per 1,000 
initiative will help governments 
to implement sustainable 
intensification of food 
production (A. Chabbi et al. 
Nature Clim. Change 7, 307-309; 
2017). Increased organic- 
carbon sequestration in soil 
underpins several Sustainable 
Development Goals (SDGs) and 
directly contributes to SDG2 
‘Zero hunger, SDG13 ‘Climate 
action and SDG15 ‘Life on land’ 
(see go.nature.com/2kwtxsy). 
To realize the promise of 
such an initiative, different 
sectors of society will need 
to stimulate and coordinate 
better communication between 
scientists, businesses, public and 
private enterprises, policymakers 
and the public. Soils must be 
recognized as natural capital that 
can contribute significantly to 
national economies and human 
welfare. 
Cornelia Rumpel CNRS, 
Institute of Ecology and 
Environmental Sciences Paris, 
Thiverval-Grignon, France. 
Johannes Lehmann Cornell 
University, Ithaca, New York, 
USA. 
Abad Chabbi INRA, Ecosys and 
3PE, Thiverval-Grignon and 
Lusignan, France. 
cornelia.rumpel@inra.fr 


Nile perch poached 
for swim bladders 


Nile perch (Lates niloticus) 

are being illegally fished in 

Lake Victoria, Africa’s biggest 
lake, driven by demand for their 
swim bladders from traditional 
Chinese medicine (see also 
Nature 551, 541; 2017). 

Fishers can be paid ten times 
more for the bladder than the 
price they can achieve for fish 
flesh, so the flesh has become a 
by-catch of the bladder harvest. 
Large fish have large bladders, 
and so poachers target fish that 
are bigger than the 85-centimetre 


upper legal length limit; these 
are not accepted by regulated 
processing factories. Large 
fish are protected because they 
are substantial spawners, and 
removing them could affect 
stock recruitment. 

Furthermore, fish-processing 
factories will not accept legally 
sized Nile perch carcasses that 
have already been opened to 
remove the swim bladder, so 
several factories have closed 
because the bladder trade has 
cut the supply of fish. This has 
reduced local employment 
and the volume of fish sold to 
export, affecting earnings from 
abroad. 

Fishery resources from Lake 
Victoria underpin the livelihoods 
of more than 35 million people, 
and fish products contribute 
about 2% to the combined gross 
domestic product of Tanzania, 
Kenya and Uganda (see 
go.nature.com/2bmpevq). 

Although the Nile perch 
is an introduced species in 
Lake Victoria and has severely 
affected natural fish abundance 
and biodiversity, it has brought 
some food security and 
economic prosperity to the 
region. Traditional medicine 
threatens both. 

Andrew Brierley University of 
St Andrews, Fife, UK. 
asb4@st-andrews.ac.uk 
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Correspondence may be 
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nature.com after consulting 
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cmchno. 


CORRECTION 

The Outlook article ‘A bag 

of surprises’ (Nature 551, 
S$40-S41; 2017) incorrectly 
identified fibronectin as 

a molecule produced by 

BCG bacteria. Fibronectin is 
produced by the human body 
and is a putative binding site 
for the BCG bacteria. 
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CLIMATE SCIENCE 


Ocean thermometer from the past 


Noble gases dissolved in an ice core from Antarctica have revealed global mean ocean temperatures for 22,000-8,000 years 
ago with unprecedented accuracy, providing a crucial benchmark for refining climate models. SEE ARTICLE P.39 


RACHEL H. R. STANLEY 


wenty thousand years ago, 

Earth was nearing the end 

of a glacial period. Gigantic 
ice sheets covered much of North 
America, Europe and Patagonia, air 
and water temperatures beyond the 
tropics were 4-23 degrees colder than 
today’? , depending on location, and 
atmospheric levels of carbon diox- 
ide were approximately 35% lower’. 
For reasons that are still unclear, the 
planet then transitioned to the warm, 
interglacial conditions that have 
lasted for about the past 11,000 years. 
On page 39, Bereiter et al.” report that 
noble gases trapped in an ice core 
from Antarctica provide a record of 
past mean ocean temperature during 
this transition, with unprecedented 
accuracy (+0.25 °C) and high tem- 
poral resolution (250 years). This 
remarkable record will enable scien- 
tists to better formulate and update 
hypotheses on the transition between 
the last ice age and present-day warm 
conditions. 

Much of the previously available 
information on ocean temperatures 
during the past thousands of years 
has come from records produced 
by organisms that lived in those 
times — for example, from differ- 
ences in observed assemblages of the 
remains of marine biota’, from ratios 
of metal ions within preserved shells’, or from 
the arrangement of chemical bonds in lipid 
biomarkers called alkenones’, all of which 
have a known temperature dependency. The 
temperatures obtained from these records are 
valuable, but are subject to uncertainties due 
to the complex responses of the organisms to 
biological and environmental processes. As a 
result, these temperature proxies are typically 
accurate to approximately 1°C. This is a prob- 
lem, because the mean temperature change 
of the ocean is thought’ to have been only 
about 3°C. 

By contrast, Bereiter and colleagues used 
a technically challenging method” in which 
noble gases trapped in an ice core (Fig. 1) act as 


Figure 1 | Ice core from the West Antarctic Ice Sheet. Measurements 
of noble gases trapped in the ice core have been used to construct a 
record of global mean ocean temperatures 22,000-8,000 years ago’. 


a proxy for temperature changes in the ocean. 
Noble gases are biologically and chemically 
inert, and therefore respond mainly to changes 
in physical conditions and processes, rather 
than in biological ones. In particular, the solu- 
bilities of noble gases — especially those of the 
heavier gases, such as krypton and xenon — 
depend on temperature. 

Gases are constantly exchanged between the 
ocean and atmosphere. As the ocean warms, 
krypton and xenon become less soluble in 
water, and so the ocean removes less of these 
gases from the atmosphere. The amount of 
krypton and xenon in the atmosphere there- 
fore increases. The elemental and isotopic 
ratios of these elements in air bubbles trapped 
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in land ice thus provide a signal that 
can be used to deduce ocean tempera- 
ture. Importantly, the laws that govern 
the physical processes underpinning 
this noble-gas proxy are more endur- 
ing than those that underpin the 
biological processes on which most 
other palaeotemperature proxies are 
based. Moreover, there is relatively 
little time delay between changes in 
ocean temperature and correspond- 
ing changes in the noble-gas signal, 
compared with many other prox- 
ies — the ‘lag-time’ of the noble-gas 
tracer is less than 100 years. Bereiter 
and colleagues’ temperature record 
is therefore more accurate and has 
greater temporal resolution than 
other records. 

The most valuable result of the 
authors’ research is the tempera- 
ture record itself, which scientists 
can use to test their climate models 
and hypotheses. For example, the 
record reveals that the temperature 
difference between the cold glacial 
period and the warm interglacial 
(up until the industrial period) was 
2.57 + 0.24°C, a number that models 
can now aim to replicate. Addition- 
ally, the high temporal resolution of 
the record means that model simu- 
lations can be checked at many time 
points during the transition, and can 
be used to explore interesting periods 
in the past in detail. 

The most surprising revelation from the 
temperature record is the extent of ocean 
warming during an event called the Younger 
Dryas, which occurred about 13,000-11,500 
years ago. This event was an interruption in 
the overall warming trend, during which sci- 
entists think that temperatures dropped by 
a few degrees in the Northern Hemisphere’ 
but continued to increase, perhaps even at an 
accelerated rate, in the Southern Hemisphere”. 
Bereiter and colleagues report that the mean 
ocean temperature (which reflects the global 
ocean, but is weighted towards the Southern 
Hemisphere) increased substantially dur- 
ing the Younger Dryas, much more than had 
been estimated: the temperature increase was 
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a whopping 1.6°C in only 700 years. This is 
about 1.7 times faster than the ocean is warm- 
ing now because of global climate change. 
The reasons for this large warming should be 
investigated. 

The authors also show that the ocean 
warmed faster than the atmosphere during 
the Younger Dryas, and then stopped warm- 
ing before the atmosphere did. By contrast, 
there was a remarkable synchronicity between 
Antarctic air temperature, mean ocean tem- 
perature and atmospheric CO, levels at all 
other times in the new record. Researchers 
must now find an explanation for the unusual 
asynchronicity during the Younger Dryas. 

Bereiter and colleagues’ work provides an 
unambiguous record of the average tempera- 
ture of the entire ocean, from the surface to the 
greatest depths. However, it does not directly 
quantify surface temperatures — either 
global average surface temperature or sea sur- 
face temperatures, both of which are useful for 
understanding and quantifying glacial-inter- 
glacial temperature differences and processes. 
The authors do provide a rough estimate of 
average surface temperatures from their data, 
by using a cohort of models to estimate the 
ratio between sea surface temperature and 
mean ocean temperature. But this constrains 
surface temperatures only weakly, highlighting 
the need for more work in this area. 

The authors present several fascinating 
hypotheses that stem from their data. For 
example, the observed synchronicity of mean 
ocean temperature with atmospheric CO, 
levels and Antarctic air temperatures leads 
Bereiter et al. to conclude that the Southern 
Hemisphere drove Earth out of the glacial 
period. Furthermore, the large warming dur- 
ing the Younger Dryas suggests that changes in 
ocean dynamics beyond simple changes to the 
Atlantic Meridional Overturning Circulation 
(a climatically crucial component of ocean 
circulation) could be the cause of this climatic 
event. Climate modellers must now test these 
and other hypotheses by adding processes 
and feedbacks to their climate models, to see 
how the resulting ocean-temperature changes 
compare with those in the authors’ noble-gas- 
derived record. Much work will be needed 
to exploit the full potential of this beautiful 
record. = 


Rachel H. R. Stanley is in the Department 
of Chemistry, Wellesley College, Wellesley, 
Massachusetts 02481, USA. 

e-mail: rachel.stanley@wellesley.edu 
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A bloody brake on 
myelin repair 


In multiple sclerosis, the blood-coagulation factor fibrinogen can enter the brain. 
It emerges that fibrinogen inhibits the maturation of cells called oligodendrocytes 
that repair nerve-fibre insulation and maintain neuronal communication. 


KLAUS-ARMIN NAVE 
& HANNELORE EHRENREICH 


ultiple sclerosis (MS) is a debilitating 
Messe disease in which the 
body’s immune system destroys 
the myelin sheath that provides electrical 
insulation for nerve fibres. Myelin repair sub- 
sequently fails owing to a lack of new myelin- 
producing cells called oligodendrocytes, 
and this contributes to an irreversible loss of 
neuronal projections called axons. Why oligo- 
dendrocyte precursor cells (OPCs) located at 
sites of MS-related tissue damage fail to differ- 
entiate into oligodendrocytes has been poorly 
understood. Writing in Neuron, Petersen et al! 
report that a blood-coagulation factor called 
fibrinogen (which enters the brain when the 
blood-brain barrier is damaged in MS”) puts 
a brake on OPC differentiation. This insight 
offers hope for future treatment strategies. 
Myelin is made by oligodendrocyte 


Blood-brain 
barrier 


Astrocyte 
terminal 
a9 


Differentiation 


processes that spiral around axonal segments, 
and it forms a multilayered membrane sheath 
that speeds up electrical conduction. Oligo- 
dendrocyte processes also support axon 
metabolism. Myelin growth is a fast process 
in which oligodendrocyte mass multiplies in 
just a few days*. In mammals, myelination 
begins around birth and OPCs are maintained 
throughout life; myelination in the cortex of 
the adult brain is thought to contribute to 
learning and higher brain functions*. Orches- 
trating timely OPC generation, oligodendro- 
cyte differentiation and energy-demanding 
myelin synthesis under changing metabolic 
conditions and in phases of physiological low- 
oxygen levels® is a major challenge. Unsur- 
prisingly, OPCs must integrate a plethora 
of external stimuli to determine when to 
differentiate. 

Similarly, myelin repair following acute brain 
injury depends on optimal timing of OPC pro- 
liferation and differentiation. Unless cellular 


Axon 


Myelin 


Oligodendrocyte repair 


Figure 1 | A coagulation factor and multiple sclerosis (MS). In MS, neuronal projections called axons 

are stripped of their insulating myelin sheath. Subsequent myelin repair often fails, but the reason for this 

has been unclear. The blood-coagulation factor fibrinogen crosses the blood-brain barrier (composed of 
endothelial cells lined with the termini of cells called astrocytes) in MS, and Petersen et al.' provide evidence 
that fibrinogen acts to inhibit myelin repair. They show that it binds to the receptor protein ACVRI on the 
surface of oligodendrocyte precursor cells (OPCs), triggering an intracellular signalling cascade in which bone 
morphogenetic protein (BMP) activates the transcription factor ID2. BMP signalling prevents OPCs from 
differentiating into mature oligodendrocyte cells, which would produce myelin and so drive myelin repair. 
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debris and blood clots are cleared and vascular 
blood supply is reinstated, remyelination will 
fail’. Thus, it is plausible that blood-borne 
signalling proteins, such as coagulation fac- 
tors deposited at sites of physical damage, are 
detected by OPCs and act as surrogate markers 
of ongoing repair of the primary injury. This 
could put differentiation on hold until the dam- 
aged environment is ready for remyelination. 

Demyelinated areas that arise in MS can 
also be considered as local “brain injuries. 
Although there is no bleeding and subsequent 
blood clotting involving coagulation factors 
in MS, chronic inflammation causes a per- 
sistent opening of the blood-brain barrier 
(BBB), across which these factors might pass in 
large amounts. Could the permanent entry of 
blood-borne coagulation factors prevent OPC 
differentiation and myelin repair? 

With this question in mind, Petersen et al. 
revisited the observation that the soluble glyco- 
protein fibrinogen, which is abundant in blood 
plasma, is deposited in demyelinated brain 
regions’. First, the authors added physiological 
concentrations of fibrinogen to OPCs in cell 
culture, and showed that this coagulation fac- 
tor strongly inhibited OPC differentiation and 
prevented axon myelination. Among the many 
genes in OPCs whose expression was affected 
by fibrinogen, the researchers detected upreg- 
ulation of members of a signalling pathway 
known to inhibit oligodendrocyte differentia- 
tion’ — genes encoding bone morphogenetic 
proteins (BMPs) and their downstream effec- 
tors, including the transcription factor ID2. 
Indeed, Petersen and colleagues showed that 
fibrinogen and ID2 could be readily visualized 
in regions in which remyelination had failed 
in the brains of people who had died with MS. 

Interestingly, the authors found that OPCs 
exposed to fibrinogen either in vitro or in the 
brains of live mice often underwent a develop- 
mental switch to become a different neuron- 
supporting cell type called an astrocyte. This 
raises the possibility that astrocytic scars (a 
form of tissue growth that occurs in response 
to injury in MS brains and that might prevent 
myelin repair) arise from a switch in OPC 
identity. Such a hypothesis will require testing 
in mouse models of MS. 

Fibrinogen drives the activation of brain- 
specific immune cells, which can indirectly 
inhibit remyelination. However, the effects 
reported by Petersen and co-workers are 
direct: they result from fibrinogen binding to 
the BMP typeI receptor protein ACVR1 on 
the surface of OPCs to stimulate the BMP sig- 
nalling cascade in these cells (Fig. 1). This is 
of interest because inhibitors of BMP signal- 
ling have already been developed. Indeed, the 
authors provide evidence that one such inhibi- 
tor can counteract the detrimental effects of 
fibrinogen on OPC differentiation, pointing 
to a possible avenue for therapy. 

In addition, fibrinogen itself might be 
a drug target. Petersen et al. show that the 


fibrinogen-cleaving enzyme ancrod — an 
anticoagulant from a snake venom that has 
been proposed (although not approved) as a 
treatment for ischaemic stroke — enhanced 
the remyelination of demyelinated axons. 
A mouse model of MS has previously been 
shown to benefit from ancrod and fibrino- 
gen depletion’, owing in part to anti-inflam- 
matory effects. However, it is possible that 
myelin repair is also improved in these ani- 
mals. Regardless of the relative contribu- 
tions of indirect and direct effects of ancrod 
on OPCs, clinical tests would be needed to 
determine the drug’s efficacy in people with 
MS. Unfortunately, given that the drug is off- 
patent, such trials are unlikely to find support 
in the pharmaceutical industry. 

It is becoming apparent that coagulation 
factors do much more than simply act in the 
blood-coagulation cascade. The research 
group that performed the current study has 
previously shown* that the enzyme thrombin, 
which cleaves fibrinogen to produce fibrin, is 
activated in demyelinated tissue. This leads to 
the formation of large fibrin complexes, which 
are equivalent to blood clots. Moreover, tis- 
sue plasminogen activator protein, which is 
routinely given to people who have had an 
ischaemic stroke to promote the breakdown 
of fibrin-containing blood clots, inhibits the 
death of oligodendrocytes’ and promotes axo- 
nal regeneration”. One must assume that these 
factors, like fibrinogen, access the brain in the 
absence of a functional BBB, and have roles in 
determining the success or failure of myelin 
repair. And although fibrinogen is apparently 
not expressed in the brain, other coagulation 
factors are’. Their uncontrolled transfer from 
the blood when the BBB leaks will no doubt 
perturb the ‘coagulation-unrelated’ functions 
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of these factors in the brain; these effects await 
exploration. 

Ifa compromised BBB is an entry port for 
blood-borne inhibitors of myelination, does 
fibrinogen entry reduce cortical myelination 
and affect higher brain functions in chronic 
conditions other than MS? The brains of 
people with Alzheimer’s disease have a leaky 
BBB and show fibrinogen infiltration". Indi- 
viduals carrying a form of the APOE gene 
that increases the risk of Alzheimer’s disease 
display reduced BBB integrity, and this vari- 
ant has been associated with age-dependent 
myelin breakdown”. Petersen and colleagues’ 
findings might thus have implications beyond 
MS — these should be investigated soon. = 
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Nanoscale interfaces 


made easily 


Methods for making interfaces between atomically thin sheets of materials might 
open the way to a range of nanotechnologies. A practically simple method has been 
reported, based on the cyclical switching of gaseous reagents. SEE LETTER P.63 


WEIJIE ZHAO & QIHUA XIONG 


tomically thin sheets of semiconducting 
A materials, known as two-dimen- 

sional semiconductors, have out- 
standing potential for making low-power, 
high-speed electronic and optoelectronic 
devices!?, including flexible electronics. 
Such applications often require heterostruc- 
tures: interfaces formed between two or more 
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2D semiconductors, which can either stack on 
top of each other (vertical heterostructures) or 
be joined at their edges (lateral heterostruc- 
tures). Versatile and scalable techniques for 
the mass production of heterostructures are 
therefore required. On page 63, Sahoo et al.* 
report a substantial advance that allows 
the controllable growth of seamless, high- 
quality lateral heterostructures made from 
widely studied 2D semiconductors known as 


ADAPTED FROM WEIJIE ZHAO 


Substrate 2 


Time 


Figure 1 | A strategy for growing lateral multi-junction heterostructures. Interfaces between the 
edges of atomically thin sheets of different semiconductors are called lateral heterostructures, and have 
potential technological applications. Sahoo et al.’ report a method for making lateral heterostructures 
from compounds known as transition-metal dichalcogenides (TMDs), which include molybdenum 
disulfide (MoS,) and tungsten disulfide (WS,). The authors heat a mixture of two powdered TMDs ina 
furnace, and pass carrier gases over them (coloured arrows). The carrier gases react with the TMDs to 
produce gaseous intermediates (not shown), which then react on the surface of a substrate to deposit 
sheets of the TMDs. When a mixture of nitrogen and water vapour is used as the carrier gas, only MoS, 
forms. When the carrier gas is switched to a mixture of hydrogen and argon, the growth of MoS, is 
terminated and WS, grows at the edge of the pre-grown MoS,. By switching cyclically between the carrier 
gases, 2D multi-junction heterostructures are produced. 


transition-metal dichalcogenides (TMDs). 

Transition-metal dichalcogenides have the 
general formula MX,, in which M is molybde- 
num (Mo) or tungsten (W) and X can be sulfur 
(S) or selenium (Se). Lateral TMD heterostruc- 
tures can be constructed by ‘stitching’ the edges 
of two TMD sheets together using covalent 
bonds. In the past few years, there has been a 
flurry of papers”’ reporting methods for syn- 
thesizing TMD lateral heterostructures using 
edge epitaxial growth, a method that allows a 
second TMD to grow at the edge of another, 
pre-grown TMD crystal. These heterostruc- 
tures can be fabricated into p-n junctions, 
which conduct currents in only one direction 
(a property known as rectification), and con- 
stitute one of the building blocks of modern 
electronic and optoelectronic devices. Two- 
dimensional p-n junctions hold great promise 
for the development of atomically thin devices 
such as light-emitting diodes, solar cells and 
integrated circuits (chips). 

Lateral TMD heterostructures have previ- 
ously been made in one-step procedures”® that 
lacked the flexibility to make multi-junction 
heterostructures or more than one type of 
heterostructure, or in two-step or multi-step 
processes that involve many changes of TMD 
precursors and reaction chambers’ ’. Sahoo 
and colleagues’ method overcomes those con- 
straints in a ‘one-pot’ procedure — a process 
that allows several steps to be performed in one 
reaction chamber. One of the many advantages 
of their strategy is the operational simplicity 
with which different TMDs can be selectively 
grown. 

The authors’ approach builds on a method 
known as chemical-vapour deposition (CVD), 
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in which a substrate is exposed to gaseous 
precursor compounds (sometimes mixed with 
carrier gases) that react or decompose on the 
substrate to deposit the targeted solid product 
at an optimal temperature and pressure. The 
researchers found that 2D MoX, and WX, 
can be grown sequentially from a mixture of 
powders of the two compounds, thus forming 
lateral heterostructures, simply by switching 
the carrier gases in the CVD growth chamber 
(Fig. 1). 

The secret to success lies in the intriguing 
and complicated chemical reactions that occur 
between the carrier gases and the powdered 
TMD solids. The reactions produce highly 
volatile species such as hydroxides and oxides, 
which undergo redox reactions at distinct rates 
to deposit MoX, or WX; selectively, depend- 
ing on the carrier gases used. When the car- 
rier is a mixture of nitro gen and water vapour, 
the growth of only MoX, is promoted. But 
when the carrier is switched to a mixture of 
hydrogen and argon, the volatile molybdenum 
compounds are quickly depleted by reactions 
with the hydrogen, so that only WX, grows. 
By switching carrier gases multiple times, as 
many alternating domains of MoX, and WX, 
as desired can be prepared — corresponding to 
a sequence of lateral heterostructures. 

Sahoo and co-workers used high-resolution 
transmission electron microscopy to show that 
some types of junction in their heterostruc- 
tures were seamless and atomically sharp. They 
also used spectroscopic techniques to confirm 
the alternating pattern of TMD domains, to 
verify that each domain contains just one type 
of TMD, and to show that the junctions in the 
heterostructures are made reproducibly. 
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50 Years Ago 


As good trade unionists know, 

wage claims at times of economic 
belt-tightening are no more 
successful than whistling in a 
blizzard. However sweet the 

music sounds, it never carries far. 
The Association of University 
Teachers is far from being a trade 
union; if it were, it would probably 
not have persisted with its claim 
that teachers in universities are 
underpaid. The British Government 
has rewarded the association for 

its pains by asking the Prices and 
Incomes Board, a notoriously 
unsentimental body, to undertake a 
survey of university salaries ... if the 
Prices and Incomes Board should 
conclude that there are no grounds 
for an increase, that is likely to be 

an end to the matter. And once the 
board ... have the bit between their 
teeth, no government is going to feel 
moved to set up a review body more 
sympathetic to the teachers. 

From Nature 6 January 1968 


100 Years Ago 


The Science Museum, South 
Kensington, was re-opened to 

the public on Tuesday, January 1. 
The museum has been closed to 

the public for nearly two years; it 
has, however, been open without 
interruption for students. As 
compared with 1914 conditions, the 
extent and the hours of opening for 
1918 are somewhat reduced, but the 
greater part of the museum will be 
open free on every weekday from 
10a.m. to5 p.m... The collections 
contain many unique objects of great 
interest as representing discoveries, 
inventions, and appliances that 
have been of first-rate importance 
in the advancement of science 

and of industry. Such objects as 
Watt's engines, early locomotives, 
steamships ... and textile machinery 
are records of British contributions 
to the progress of the world. 

From Nature 3 January 1918 
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The authors went on to demonstrate that 
their technique could be used to make multi- 
junction lateral heterostructures for com- 
pounds known as TMD ternary alloys (which 
contain one type of metal, but a mixture of sul- 
fur and selenium atoms). To do this, the authors 
used a powdered mixture of MoSe, and WS,, 
or of MoS, and WSe, (rather than a mixture 
of MoS, and WS,, or of MoSe, and WSe,, as in 
their first experiments). This produced high- 
quality, 2D lateral heterostructures consisting 
of domains containing the alloys MoS,,,_,Se5, 
or WS,,1_.)S€2, (where x is a number less than 
1). The optical and electrical properties of such 
heterostructures could now be fine-tuned by 
altering the alloy composition”. 

The authors conducted preliminary 
electrical characterizations of single-junction 
heterostructures produced using their method. 
They observed that planar p—n junctions that 
formed at the boundaries of electron-doped 
MoxX, (made by adding a small amount of elec- 
trons to MoX,) and hole-doped WX, (formed 
by removing a few electrons from WX,) show 
good rectification behaviour, which is a further 
indication of the high quality of the hetero- 
structures. They also observed photodiode 
behaviour — the generation of a substantial 
current when the junction area was illumi- 
nated by light. Having the ability to build such 
tiny p—n diodes and photodiodes holds great 
potential for future efforts to miniaturize 
electronic and optoelectronic devices. 

Sahoo and co-authors’ method opens up 
a promising route for the synthesis of high- 
quality lateral heterostructures. Insights into 
the thermodynamics and chemistry oper- 
ating at the atomic scale in this process are 
now needed to develop the ability to prepare 
heterostructures involving any desired com- 
bination of TMDs. Moreover, research must 
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be performed to work out why interfaces that 
switch from MoX, to WX, are not as sharp 
as those in which WX, switches to MoX,, 
and to optimize the production of sharper 
MoX,-WxX, interfaces. 

It will also be important to explore variations 
of the technique that might allow the growth 
of lateral heterostructures between MX, and 
other exotic 2D materials, including those 
that have metallic, semi-metallic or super- 
conducting properties’, to make new types 
of device. The availability of complex TMD 
heterostructures — including those that have 
several junctions in series — should also 
allow the exploration of fundamental phys- 
ics, such as the mechanism by which charge 
transfer occurs at interfaces. Lastly, Sahoo and 
co-workers’ technique will enable the develop- 
ment of proof-of-concept prototype devices, 
to advance our knowledge of the viability and 
scope of 2D technologies. m 
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More than one way toa 
central nervous system 


Have the molecular mechanisms that are linked to the developmental 
organization of centralized nervous systems evolved once or multiple times? 
Evidence from nine animal species points to the latter. SEE ARTICLE P.45 


CAROLINE B. ALBERTIN 
& CLIFTON W. RAGSDALE 


nimal nervous systems come in many 
Ave and sizes, ranging from a 
handful of neurons to large, complex 
brains. A key question has been whether the 


centralized nervous systems found in many 
bilaterally symmetrical animals (bilaterians), 


which include vertebrates and insects, share a 
common evolutionary origin, or evolved more 
than once. At a superficial level, both flies 
and vertebrates boast a brain connected toa 
single nerve cord that extends into the trunk. 
In addition, molecular data indicate that key 
regulatory genes are deployed similarly during 
nervous-system development in vertebrates, 
flies’ and another bilaterian, a segmented worm 
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More than one way toa 
central nervous system 


Have the molecular mechanisms that are linked to the developmental 
organization of centralized nervous systems evolved once or multiple times? 
Evidence from nine animal species points to the latter. 


CAROLINE B. ALBERTIN 
& CLIFTON W. RAGSDALE 


nimal nervous systems come in many 

shapes and sizes, ranging from a 

handful of neurons to large, complex 
brains. A key question has been whether the 
centralized nervous systems found in many 
bilaterally symmetrical animals (bilaterians), 
which include vertebrates and insects, share 
a common evolutionary origin, or evolved 
more than once. At a superficial level, both 
flies and vertebrates boast a brain connected 
to a single nerve cord that extends into the 
trunk. In addition, molecular data indicate 
that key regulatory genes are deployed simi- 
larly during nervous-system development in 
vertebrates, flies! and another bilaterian, a seg- 
mented worm (an annelid)”. These similarities 
have been interpreted as evidence for evolu- 
tionary conservation of an ancient bilaterian 
developmental program for centralized nerv- 
ous systems. But in a paper online in Nature, 
Martin-Duran and colleagues® provide evi- 
dence for the independent evolution of such 
nervous systems. 

The evolutionary steps between a nerve net 
and the elaborate centralized nervous systems 
of bilaterians have been an area of active inter- 
est for more thana century’. In the mid-1980s, 
our ability to study this process received a 
boost, thanks to the discovery ofa large family 
of genes that encode transcription factors con- 
taining a DNA-binding homeobox domain’. 
It emerged that members of this homeobox- 
gene family, including the Hox complex, are 
expressed in the same order along the head- 
to-tail (anterior—posterior) axis during devel- 
opment in many distantly related bilaterians, 
including flies and vertebrates®. It was later 
shown that a signalling pathway governed by 
genes that encode bone morphogenetic pro- 
teins (BMPs) is needed to establish the dor- 
sal-ventral (back-to-belly) body axis in diverse 
bilaterians’. 

Given these insights, it was not surprising 
to find that a suite of homeobox genes is also 
expressed in strikingly similar patterns along 


the dorsal-ventral axis of the developing 
nervous systems of vertebrates and fruit flies’. 
Along this axis, staggered homeobox-gene 
expression correlates with the development of 
specific neuron types in different regions. The 
discovery that these genes are also expressed 
along the dorsal-ventral nervous-system axis 
in Platynereis dumerilii (an annelid distantly 
related to flies and vertebrates) was seen 
as evidence that bilaterian nerve cords are 
evolutionarily conserved’, 

Advances in phylogenetic methods for 
analysing evolutionary relationships, coupled 
with broader sampling across the evolution- 
ary tree, have altered our understanding of 
animal relationships. In 2016, a phylogenetic 


analysis identified an assemblage of small, 
bilaterally symmetrical, simple worms, col- 
lectively referred to as xenacoelomorphs, as 
the sister group to the rest of the bilaterians 
(nephrozoans)’. Because xenacoelomorphs are 
the closest living relatives to the nephrozoans 
(Fig. 1), comparisons between these two groups 
can help researchers to infer traits present in the 
last common ancestor of all bilaterians. 

Xenacoelomorphs display diverse 
nervous-system arrangements. Some have 
only a nerve net, like the closest relatives of 
bilaterians, the cnidarians (jellyfish and sea 
anemones). Others also have one or more 
nerve cords that are located dorsally, ventrally 
or at multiple positions along the dorsal- 
ventral axis. Martin-Duran and colleagues 
investigated the expression of patterning genes 
in four xenacoelomorph species. They found 
that, although the expression of BMP and 
anterior—posterior homeobox genes in these 
species was consistent with patterns seen in 
other bilaterians, the expression of the dor- 
sal-ventral homeobox genes in the nervous 
system was not. 

Martin-Duran et al. next investigated 
dorsal-ventral patterning in the nephro- 
zoans. For this work, they extended their 
analysis of dorsal-ventral homeobox genes 
to five species within the Spiralia, a large but 
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Figure 1 | Evolution of animal nervous systems. The bilaterians (animals that show bilateral symmetry) 
consist of nephrozoans and a sister group, xenacoelomorphs. Many nephrozoans and xenacoelomorphs 
have centralized nervous systems, unlike their closest relatives, cnidarians, which feature a simple nerve 
net. A suite of homeobox genes is expressed (stars) along the back-to-belly axis of the central nervous 
systems of vertebrates, flies and a segmented worm (an annelid), and it has been posited that this is an 
evolutionarily conserved gene-expression pattern guiding the development of a centralized nervous 
system that originated from a common bilaterian (blue circle) or nephrozoan (purple circle) ancestor. 
However, Martin-Duran et al.* did not find this pattern in nine bilaterian species — five spiralians 

and four xenacoelomorphs (red text). Their data strengthen the case that the developmental and 
morphological similarities between bilaterian centralized nervous systems are the result of independent 
evolutionary events that converged on similar outcomes. 
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little-studied bilaterian group that includes 
annelids, flatworms and molluscs (Fig. 1). The 
researchers found that the anticipated dorsal- 
ventral homeobox pattern was rarely observed, 
even in part, in the nervous systems of these 
species, including in an annelid closely related 
to P. dumerilii. These results suggest that even 
closely related species that have similar nerv- 
ous-system architectures can deploy ancient 
genes very differently. 

Previous studies in acorn worms® 
(hemichordates) and flatworms’ found no 
dorsal-ventral homeobox-gene expression 
in their trunk nervous systems. This absence 
was previously interpreted as a secondary 
loss of an ancestral neural patterning system. 
But in light of Martin-Duran and colleagues’ 
data, this condition could, in fact, reflect the 
ancestral nephrozoan state. It now seems that 
the ‘typical’ dorsoventral gene network was 
not deployed in the nervous system of the last 
common ancestor of bilaterians or nephrozo- 
ans. Rather, the developmental mechanisms 
that pattern the neural cords in mice, flies and 
P. dumerilii might have evolved convergently. 

Martin-Duran and colleagues’ work paints a 
complex and nuanced picture of nervous-sys- 
tem evolution. Their data raise the possibility 
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of multiple origins of animal nerve cords, 
and suggest that a suite of genes that pattern 
the dorsal—ventral axis has been repeatedly 
co-opted into nervous-system development. 
Indeed, the authors show that the relation- 
ship between an animal’s morphology and 
the expression of particular developmen- 
tal genes might not always be tightly linked. 
These insights raise exciting questions about 
the mechanisms of evolutionary change 
that underlie the development of morpho- 
logical diversity, including why convergently 
evolved nervous systems sometimes use highly 
conserved suites of genes, and what develop- 
mental constraints govern variations in these 
mechanisms across animals. 

A frequent criticism of the study of key 
model organisms such as fruit flies, mice and 
nematode worms is that these species are 
highly derived — that is, they contain many 
traits unique to them — and thus are unlike 
any distant ancestor. But all living species are 
highly derived, being shaped by natural and 
sexual selection on evolutionary timescales 
to maintain adaptation to varying ecological 
niches. What Martin-Duran and co-workers 
have highlighted is not that these model organ- 
isms are inappropriate ‘reference species’””. 


Rather, they demonstrate the importance both 
of developing reference species for multiple 
groups within a robust phylogenetic frame- 
work, and of consistently examining close 
relatives of the reference species before draw- 
ing conclusions about the evolutionary history 
of shared features. m 
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drawing conclusions about the evolutionary 
history of shared features. m 
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Ultrasound approach 
tracks gut microbes 


Monitoring microbes that live deep inside the gut is a challenge. Engineering 
bacteria to express structures that can be tracked by ultrasound offers a way to 
locate such cells in vivo, and might have clinical implications. SEE LETTER P.86 


RICARD SOLE & NURIA CONDE-PUEYO 


microbial ecosystem exists inside 

you that is as rich and complex as the 

rainforest. Like the rainforest, this 
ecosystem contains inaccessible realms that 
are usually hidden from view. When trying to 
observe the living gut, a major problem is that 
light-based imaging techniques can monitor 
only a limited depth below the surface. How- 
ever, on page 86, Bourdeau et al. report an 
ultrasound approach for exploring this inner 
world that they use to map the in vivo location 
of specific microbial-cell populations. Some 
medical approaches currently in use or being 
developed introduce bacterial cells as a therapy 
for gut disease or cancer, so this ultrasound 
technique might be adapted for clinical use to 
determine whether such cells have reached the 
desired location. 

Microbial communities have been coevolving 
with humans over millions of years’, and they 
display notable spatial and temporal regulari- 
ties in their organization. This natural ecosys- 
tem assembles at birth, develops, responds to 
perturbations and stress, and can sometimes 
collapse. Yet determining the laws and fra- 
gilities of life deep within the gut has been 
difficult, and even some of the best whole- 
body imaging techniques available can reveal 
structures at depths of only centimetres below 
the surface**. 

Bourdeau and colleagues offer an innova- 
tive solution. Ultrasound imaging has so far 
mainly been used to assess tissues, but the 
authors reveal that it can also be used to effi- 
ciently track populations of bacterial cells that 
have been genetically engineered to express 
what they term acoustic reporter genes. These 
encode components that form intracellular, 
protein-enclosed, gas-filled structures called 
gas vesicles, which are naturally present in 


many microorganisms, in which they control 
buoyancy in aqueous environments’. 
Ultrasound detection involves directing 
pulses of sound waves towards a sample and 
monitoring the reflected echoes, which are 
affected by density differences in the sub- 
stances that the sound passes through. Gas 
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vesicles scatter sound waves, and organisms 
containing them can be monitored using 
ultrasound’. Pressure pulses above a certain 
level cause gas-vesicle collapse; therefore, 
ultrasound signals that disappear after such 
pulses can be inferred to have originated from 
gas vesicles®, an approach that could be used 
to enhance signal detection above background 
levels (Fig. 1). 

There had been no previous tests to discover 
whether cells that do not normally form gas 
vesicles could be genetically engineered to 
do so, allowing such cells to be monitored by 
ultrasound. Bourdeau et al. engineered types of 
microorganism currently being used or devel- 
oped as therapeutics to express gas-vesicle 
components. One of these microbes was a non- 
pathogenic strain of the bacterium Escherichia 
coli that is given to some people who have a gut 
infection’. Another was Salmonella enterica 
Typhimurium bacteria, which can invade 
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Figure 1 | Using ultrasound to monitor the in vivo dynamics of cell populations in space and 

time. a, Bourdeau et al.' genetically engineered bacteria to express what they term acoustic response 
genes (ARG), which encode the components of hollow structures called gas vesicles that scatter sound 
waves and generate an echo that can be detected by ultrasound. Pressure-pulse application causes gas- 
vesicle collapse and disappearance of the ultrasound signal, which can be used to improve signal detection 
when tracking the location of cells containing gas vesicles. This approach enables in vivo monitoring of 

a cell population deep within the mouse gut that cannot be tracked by light microscopy. b, The authors 
engineered two types of gas vesicle (red and blue) that collapse at different pressure-pulse levels, enabling 
cells containing these vesicles to be distinguished using ultrasound. One possible application of this work 
might be to introduce two bacterial strains that each contain one type of these gas vesicles into a mouse. 
This would enable non-invasive in vivo temporal and spatial monitoring of the dynamics of two distinct 
bacterial populations in the gut in regions such as the small intestine or colon. 
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tumours. Mouse models of tumour invasion 
by S. enterica Typhimurium are being investi- 
gated to determine the potential for using such 
bacteria to release tumour-killing drugs*”. 

The authors introduced engineered bacteria 
that expressed gas vesicles into the mouse gut 
and showed that the ultrasound-imaging tech- 
nique works efficiently even for highly diluted 
cellular populations — signals were detected 
for E. coli cells present at a concentration of 
5 x10’ cells per millilitre. The authors also 
demonstrated that they could engineer bacte- 
rial strains that generate distinguishable ultra- 
sound signals, enabling two different bacterial 
populations to be monitored simultaneously 
by using strains containing gas vesicles that 
collapse at different pressure-pulse levels. 

The authors compared their ability to moni- 
tor the location of engineered bacteria using 
either ultrasound or a method that tracks 
bacteria expressing a light-emitting mol- 
ecule, using strains that live in a deep inter- 
nal gut region that is difficult to visualize by 
optical methods. The ultrasound approach 
outperformed its luminescent counterpart, 
and ultrasound signals from engineered bac- 
teria provided a high level of spatial resolu- 
tion and reached deep areas that could not be 
monitored by a luminescence-based approach. 
Using ultrasound, the authors detected bacte- 
ria coating the surface of the colon and present 
at cell concentrations similar to those used in 
therapeutic treatments. 

In vivo imaging systems that enable real- 
time monitoring of tumour-infiltrating 
luminescent bacteria can effectively monitor 
bacterium-associated tumours just below the 
skin in mice””’, but are of little use for moni- 
toring more-internal anatomy. Bourdeau and 
colleagues’ ultrasound approach provides 
good images of engineered strains of S. enter- 
ica Typhimurium that reside deep within an 
internal murine tumour (an ovarian adeno- 
carcinoma) that developed from transplanted 
human ovarian-cancer cells. 

This ultrasound technique might also 
be helpful for the validation and tuning of 
approaches that use engineered bacterial 
cells to target tumours. In vivo imaging is an 
important part of assessing these treatments 
in animal models, including determining the 
correct dosage and estimating treatment- 
response times. Even at this proof-of-concept 
level, there is enormous promise that this non- 
invasive method might be used to monitor the 
effect of a bacterium-based cancer therapy in 
an individual over time. This work might also 
offer a tool for the optimization of other thera- 
pies and diagnostics being developed in which 
a synthetic-biology approach is used to engi- 
neer cells to have biological pathways that are 
not normally present in a particular cell type". 

Moreover, Bourdeau and colleagues’ 
work might be complemented by another 
sound-based imaging technique, called 
photoacoustic imaging. In this approach, light 


or radio-frequency pulses trigger a thermal 
expansion of target tissues that generates 
acoustic waves’. Integrating photoacoustic 
imaging with the authors’ method could allow 
the precise location of bacteria to be deter- 
mined alongside detailed information of the 
surrounding tissue in vivo. 

Other extensions and applications of the 
work by Bourdeau and colleagues can be 
envisaged. For example, engineered groups of 
bacteria’*"* might be designed to produce an 
ultrasound signal in response to specific ranges 
of physiological and environmental conditions 
in the gut. And bacterial cells engineered to 
respond if they interact with gut cells might 
help to trace the gut’s functional biogeography. 
The ability to selectively control the expression 
of the acoustic response genes could be help- 
ful in designing experiments to monitor how 
newly introduced bacteria colonize the gut or 
to observe the destruction of bacterial patho- 
gens over space and time during therapy. 

Perhaps this new technique could also be 
used to study systems beyond the body, such 
as the microbial ecosystems in healthy or 
damaged soil habitats. The soil can have a rich 
microbial community, and the spatial ecol- 
ogy of soil microbes is not fully understood”. 
Charles Darwin's image of a “tangled bank” of 
complex organismal interactions is relevant to 
both the ecological networks in the soil and the 
complexity of the cellular interactions in the 
gut. Flexible investigation tools are needed to 
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understand these types of ecology, and future 
studies building on the work of Bourdeau and 
colleagues to report precise, acoustic-based 
imaging of the spatial dynamics of cells might 
bea crucial step forward. = 
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Escape from senescence 
boosts tumour growth 


Some chemotherapies block cancer growth by driving tumour cells into a state of 
cell-division arrest termed senescence. It emerges that such cells have a boosted 
capacity to drive tumour growth if they exit senescence. SEE LETTER P.96 


JAN PAUL MEDEMA 


they can enter a state of cell-division 

arrest termed senescence’, which is usu- 
ally thought to be irreversible. Senescence 
protects organisms from potentially dan- 
gerous cellular proliferation, for example by 
preventing cell division after severe DNA 
damage. Many anti-cancer therapies cause 
cancer-cell senescence, which is considered 
to be a positive outcome of such treatment. 
However, Milanovic et al.’ reveal on page 96 
the unexpected twist that chemotherapy- 
induced senescence might generate tumour 
cells that have an enhanced potential to 
drive tumour growth if they exit senescence. 


[: cells encounter certain types of stress, 


Senescence induction has been studied 
intensively for decades. The phenomenon 
was first described in fibroblast cells grown 
in vitro, and entry into the senescent state in 
this context was considered to be a hallmark 
of cellular ageing’. Subsequent research has 
revealed that the induction of senescence is 
a cellular response that occurs during both 
physiological and pathological processes’. 

The protein p53 is one of the key proteins 
that can act as a cellular sensor and drive a 
cell to enter senescence. It responds to DNA 
damage, and its action can cause permanent 
cell-cycle arrest by activating the proteins 
p16'““* and p21. A senescent state can also 
be promoted by addition of methyl groups 
to specific amino-acid residues on histone 
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Escape from senescence 
boosts tumour growth 


Some chemotherapies block cancer growth by driving tumour cells into a state of 
cell-division arrest termed senescence. It emerges that such cells have a boosted 
capacity to drive tumour growth if they exit senescence. 


JAN PAUL MEDEMA 


they can enter a state of cell-division arrest 
termed senescence’, which is usually 
thought to be irreversible. Senescence protects 
organisms from potentially dangerous cellular 
proliferation, for example by preventing cell 
division after severe DNA damage. Many anti- 
cancer therapies cause cancer-cell senescence, 
which is considered to be a positive outcome of 
such treatment. However, in a paper online in 
Nature, Milanovic et al.’ reveal the unexpected 
twist that chemotherapy-induced senescence 
might generate tumour cells that have an 
enhanced potential to drive tumour growth if 
they exit senescence. 
Senescence induction has been studied 
intensively for decades. The phenomenon 
was first described in fibroblast cells grown 
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in vitro, and entry into the senescent state in 
this context was considered to be a hallmark 
of cellular ageing’. Subsequent research has 
revealed that the induction of senescence is 
a cellular response that occurs during both 
physiological and pathological processes’. 

The protein p53 is one of the key proteins 
that can act as a cellular sensor and drive a 
cell to enter senescence. It responds to DNA 
damage, and its action can cause permanent 
cell-cycle arrest by activating the proteins 
pl6™** and p21. A senescent state can also 
be promoted by addition of methyl groups to 
specific amino-acid residues on histone pro- 
teins that bind DNA“. This methylation results 
in chromosomal compaction, which keeps 
the DNA in a transcriptionally inactive con- 
formation and thus helps to make entry into 
senescence irreversible. 

It had been previously observed that some 
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Figure 1 | Enhanced tumour growth when cancer cells exit senescence. Milanovic et al.’ studied tumour 
growth in mice using genetically engineered lymphoma tumours. These tumours contain proliferating 
cells and a low proportion of cancer stem cells. In this model system, chemotherapy treatment induces 

the tumour cells to enter a non-proliferating state called senescence (and addition of the drug tamoxifen 
regulates the activity of proteins that are needed for the senescent state). The authors found that senescence 
was associated with the activation of the Wnt signalling pathway. Removal of tamoxifen enabled the 
tumour cells to exit senescence. These senescence-evading tumours had a high proportion of cancer stem 
cells and were faster-growing compared to the tumour state before chemotherapy treatment. If tamoxifen 
removal was combined with the addition of Wnt inhibitors, exit from senescence was not associated with 
faster tumour growth or the presence of a high proportion of cancer stem cells. 


proteins that regulate the senescent state 
also have key functional roles in stem cells®. 
Milanovic and colleagues” investigated whether 
there might be a stem-cell connection to senes- 
cence induced by chemotherapy. Analysing 
gene-expression profiles in mice in a type of 
cancer called lymphoma, the authors observed 
that cellular signalling pathways activated 
during chemotherapy-induced senescence 
are similar to the gene-expression patterns 
observed in stem cells, patterns that collectively 
define a cellular state called stemness. 

Stem cells are at the top of the cell-division 
hierarchy and are thought to be able both to 
divide indefinitely and to generate the distinct 
cells present ina given tissue’. Moreover, stem 
cells have also been found within tumours, and 
experimental evidence indicates that cancer 
stem cells can drive cancer growth, as well as 
aiding tumour-cell migration and dispersal to 
other locations in the body in a process called 
metastasis’. 

It seems counter-intuitive that the induc- 
tion of senescence in cancer cells that arrests 
tumour growth would drive the gene-expres- 
sion programs associated with the stem cells 
that drive the disease. Yet the authors consist- 
ently made this observation when they investi- 
gated a variety of cancer model systems of both 
human and mouse origin. 

To investigate whether these acquired 
stemness features affect growth when 
cancer cells escape from senescence, Milanovic 
and colleagues used a genetically engineered 
tumour in mice in which a state of cell-cycle 
arrest could be maintained by administration 
of the drug tamoxifen (Fig. 1). Surprisingly, the 
authors observed that cells exiting senescence 
when tamoxifen was removed have a greater 
capacity to drive tumour growth than do 
control tumour cells that did not go through 
a senescent phase. The authors therefore con- 
clude that senescence induction in cancer 
could have an unexpected ‘dark side if such 
tumour cells break through the cell-cycle- 
arrest barrier. 

This is not the first indication that senes- 
cence might come at a cost. For example, 
senescent cells secrete a range of cytokine pro- 
teins that have a tumour-promoting effect on 
cancer cells in the vicinity by stimulating the 
stem-cell properties of such cells”*. Milanovic 
and colleagues’ work, however, goes beyond 
observations of an indirect effect by reveal- 
ing that senescent tumours have an intrinsic 
capacity to form an increased proportion of 
cancer stem cells. Although a role in this pro- 
cess for cytokines produced by other cells is 
not definitely excluded, the authors’ single-cell 
analysis is consistent with the phenomenon 
being cell-autonomous. This analysis reveals 
that cancer cells that have senescent hallmarks 
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can, on release from senescence, proliferate 
and show hallmarks of cancer stem cells. The 
cancer-stem-cell features gained by these post- 
senescent cells cannot be explained by these 
cells simply being a cancer-cell subset that 
failed to enter senescence, because the authors 
show that entering senescence is a requirement 
for this process to occur. 

The authors found a link between the acti- 
vation of the Wnt signalling cascade and the 
senescent state. The observation that this well- 
studied stem-cell signalling pathway is acti- 
vated during senescence provides additional 
confirmation of the surprising link with the 
induction of stem-cell characteristics. How- 
ever, it is not clear why this pathway is acti- 
vated. Nor is it clear whether Wnt ligands are 
secreted by senescent cells and whether such 
ligands then act on the same cell that secretes 
the protein or on neighbouring cells. 

Notably, this finding also offers a means of 
targeting the potentially harmful effects of the 
cancer stem cells generated. The authors found 
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that treatment of cells with a Wnt-pathway 
inhibitor could decrease tumour growth on 
exit from senescence. This discovery should 
be investigated in the clinic to determine 
whether it could enhance the effectiveness of 
chemotherapy. 

Although these studies provide strong 
evidence for a close link between senescence 
and stemness, most of the work used a geneti- 
cally engineered model system that allows 
exit from senescence to be controlled at will 
by removing a drug. How cancer cells might 
naturally break through senescence barri- 
ers in vivo, and whether this might be linked 
to acquisition of cancer stemness, should be 
investigated. The authors tried to address this 
by analysing spontaneous escape from senes- 
cence in samples of cancer cells from their 
mouse model grown in vitro, and also detected 
increased cancer stemness features in this con- 
text. Additional confirmation of these findings 
in non-genetically modified cancer models will, 
however, be needed. Nevertheless, Milanovic 


and colleagues’ data provide compelling 
evidence in the systems they studied that, 
when cancer cells escape from senescence, they 
have an enhanced capacity to drive tumour 
growth — a finding that has potential clinical 
implications. m 
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Mean global ocean temperatures during 
the last glacial transition 


Bernhard Bereiter!**, Sarah Shackleton!, Daniel Baggenstos!*, Kenji Kawamura**° & Jeff Severinghaus! 


Little is known about the ocean temperature’s long-term response to climate perturbations owing to limited observations 
and a lack of robust reconstructions. Although most of the anthropogenic heat added to the climate system has been 
taken up by the ocean up until now, its role in a century and beyond is uncertain. Here, using noble gases trapped in 
ice cores, we show that the mean global ocean temperature increased by 2.57 + 0.24 degrees Celsius over the last glacial 
transition (20,000 to 10,000 years ago). Our reconstruction provides unprecedented precision and temporal resolution 
for the integrated global ocean, in contrast to the depth-, region-, organism- and season-specific estimates provided by 
other methods. We find that the mean global ocean temperature is closely correlated with Antarctic temperature and has 
no lead or lag with atmospheric CO», thereby confirming the important role of Southern Hemisphere climate in global 
climate trends. We also reveal an enigmatic 700-year warming during the early Younger Dryas period (about 12,000 
years ago) that surpasses estimates of modern ocean heat uptake. 


Today, the global ocean takes up about 93% of the excess heat from 
anthropogenic activities’, which dominates the current global radiation 
imbalance’. Owing to the heterogeneity and size of the global ocean it is 
difficult to measure its heat content and mean (global) ocean temper- 
ature (MOT) precisely. A large number of sensors are needed to track 
regional changes and derive global trends, as in the Argo float array 
project?. Nevertheless, this system does not yet cover much of the deep 
ocean (depth below 2,000 m), leaving uncertainty in the MOT esti- 
mates for the current warming. For changes in MOT before the Argo 
float system started (around Ap 2,000), the data basis is much weaker, 
because the observations were much more sparse’. Considering that 
the slow overturning time of the global ocean (centuries to millennia) 
determines the responsiveness of MOT to changing climate, there is 
much interest in reconstructing ocean temperatures before the first 
observations (about AD 1872). 

Marine proxies have produced such reconstructions on a variety 
of temporal and spatial scales*~’; however, the different proxies have 
strengths and weaknesses, leading to debate about the interpretation 
of the corresponding data (ref. 4 and references therein). The difficulty 
lies in separating temperature from other effects as well as assessing a 
precise proxy-to-temperature transfer function because of the complex 
biogeochemistry behind these proxies and potential regional as well 
as temporal differences*®. Although trends in these proxies might be 
representative of the temperature trends, these issues are in particular 
problematic for the absolute accuracy of the corresponding temperature 
scale. The uncertainty of the absolute scale lies in the range** of +1°C, 
which poses a major limitation for the determination of the glacial- 
interglacial MOT change (about 3°C)*. 

Here we use a proxy for MOT introduced in ref. 9 based on measure- 
ments of inert or noble gas mixing ratios (Kr/N, Xe/N2, Xe/Kr) in ice 
core samples (see Methods and ref. 10 for analytical details). The data 
are used to reconstruct past MOT with unequalled accuracy, taking 
advantage of the following characteristics of the ocean—atmosphere sys- 
tem: (1) any heat and gas exchange takes place at the ocean-atmosphere 
interface; (2) there are no essential internal heat sources or sinks in the 


ocean!!; (3) there are no essential sources or sinks of the measured 
gases in the combined ocean—atmosphere system; and (4) each gas spe- 
cies has a unique and well defined temperature-dependent solubility. 
Therefore, a change in MOT leads to a change of the dissolved noble 
gas inventory in the ocean, which is in turn mirrored by an opposing 
change in the atmosphere without any intrinsic temporal delay or fil- 
tering (see detailed discussion in Methods). Because the atmosphere is 
well mixed this method effectively integrates globally. Thus, as opposed 
to marine proxies, the atmospheric noble gas ratio is a purely phys- 
ics-driven proxy for the global ocean heat content and MOT”. 

We analysed 78 ice samples (including ten partial to full sample rejec- 
tions; see Methods) from the WAIS Divide ice core that cover the Last 
Glacial Maximum (LGM) to the pre-industrial period. For the period 
22-8 kyr Bp (thousands of years before ‘present’ that is, aD 1950)— 
which contains the last glacial transition (20-10 kyr Bp)—a high tem- 
poral resolution of 250 yr on average was obtained. Together with the 
rich information available from the same ice core and the excellent 
age control in this climate archive, our record allows unprecedented 
insights into the interplay between climate and MOT during a period 
of major climate change. 


Inferring MOT from noble gases 

To derive the atmospheric ratios needed for the MOT reconstruction, 
the raw data has to be corrected for gravitational enrichment and 
thermal fractionation in the firn column”. As in refs 9 and 10, we 
use the measured argon isotope ratio §°Ar (7°Ar/*°Ar) to correct the 
elemental ratios for the gravitational fractionation. The correction we 
apply assumes that the firn air column is in full thermal-gravitational 
equilibrium, which might not have been the case, as indicated by the 
difference between the 5°°Kr (°Kr/**Kr) and 6“°Ar (see Methods). This 
anomaly in 6°*Kr is a phenomenon that needs to be investigated fur- 
ther; however, it is roughly constant over the entire record, suggesting 
that the potential bias is small on relative changes within the record (but 
might have an effect on the absolute scale of about 0.3 °C—see below 
and Methods for more details). 
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Organizations of Information and Systems, 10-3 Midori-cho, Tachikawa, Tokyo 190-8518, Japan. ®Department of Polar Science, Graduate University for Advanced Studies (SOKENDAI), 10-3 
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Figure 1 | Schematic of the four-box model used to derive MOT, 
including the modern (‘Today’) and LGM characteristics of the boxes. 
The shape and location of the boxes indicates roughly their zonally 
averaged situation in the modern ocean. Black arrows indicate the 
meridional circulation pattern of the two deep-water masses AABW 

and NADW. White arrows indicate the exchange of noble gases between 
the boxes and the geographical area in which they occur. The modern 
temperatures T, volumes V (as fraction of the total ocean) and salinities 
S (in units of the practical salinity scale, PSS) of AABW and NADW are 
based on ref. 20, while the parameters for the residual ocean are chosen 
such that the budget for the global average ocean (T= 3.53 °C; S= 34.72 
PSS; V= 100%/1.34 x 10!* m?) is closed. The LGM parameters are based 
on the scaling of volume and salinity as well as the constraints from the 
noble gas data (see Methods for more details). 


The thermal fractionation correction is minor at the WAIS Divide 
ice core site owing to high accumulation rates and the gradual surface 
temperature changes!3, which limit the temperature differences over 
the length of the firn column to about 1°C. The effects are, however, not 
negligible (approximately 0.25 °C change in MOT per 1°C difference). 
Therefore, we correct our data for the thermal fractionation using two 
independent firn column temperature scenarios which represent the 
range of uncertainty of this correction element (see Methods). For our 
analysis below we combine the two scenarios in a Monte Carlo fashion 
to incorporate this uncertainty into our final best-estimate record. 

To reconstruct MOT from the palaeo-atmospheric Kr/N>, Xe/N2 and 
Xe/Kr ratios, we use a four-box ocean—atmosphere model based on 
refs 9 and 10 (Fig. 1 and Methods). To account for changes in sea-level 
pressure, ocean volume and salinity, which affect the inventory of soluble 
gases in the ocean, we use the sea-level record of ref. 14. For each gas 
ratio 12,000 Monte Carlo MOT realizations are calculated that incor- 
porate analytical uncertainties, uncertainties of the sea-level record, the 
degree of gas saturation, and those related to the applied firn thermal 
correction mentioned above (more details in Methods). We combine 
all realizations (36,000 in total) to a single best-estimate record (Fig. 2, 
red, ‘Mix’). In this way, the obtained uncertainty accounts for incon- 
sistencies between the estimated and effective thermal fractionation 
factors, for biases of the single-ratio MOT records (see Methods), as 
well as for all known model and analytical uncertainties. Thus, our 
uncertainty estimate is representative of the relative changes within our 
MOT record. Note that the uncertainty does not account for the poten- 
tial bias induced by firn air disequilibrium mentioned above. Figure 3b 
shows a splined version of our best-estimate record with a low cut-off 
frequency so as not to dampen sharp features in our record; however, 
caution is required when interpreting excursions based on single data 
points, such as (for example) around 20 kyr Bp. 


Glacial-interglacial MOT difference 
On the basis of our best-estimate record we determine the MOT change 


from the LGM to the Early Holocene (averaging periods marked by 
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Figure 2 | MOT records relative to today derived from three different 
atmospheric noble gas ratios and their mixture. The records are based 
on 69 individual ice core samples with a distinct age (WD2014 age scale*’), 
and each sample provides a separate value for atmospheric Kr/N>, Xe/ 

N2 and Xe/Kr (if not subject to rejections; see Methods). Dashed vertical 
lines and labels mark different time periods (B/A, Bolling—Allerod; 

YD, Younger Dryas), as also in Fig. 3. The ‘Mix’ MOT record (red; best 
estimate) is not shifted, whereas the records based on the individual ratios 
are shifted as follows for better visibility: Kr/N (orange) by —1°C, Xe/N> 
(magenta) by —2 °C, Xe/Kr (purple) by —3 °C. Deviations of the individual 
records relative to each other are in Methods. The mean values and their 
error bars (1c) include all analytical uncertainties and different scenarios 
as described in Methods. 


grey bars in Fig. 3) to 2.57 + 0.24°C (1a). This is comparable to the 
estimates from marine proxies‘ of 3+ 1°C. The major contribution 
to the uncertainty estimate originates from a possible change in 
saturation state of the gases in the ocean. Today, the deep-water masses 
are slightly undersaturated with noble gases with respect to the water 
temperature’>!®. During the LGM this undersaturation could have 
been reduced by about 50%, which would cause a bias in the LGM 
MOT of 0.24°C in our best-estimate record (see Methods for more 
details). All other sources of uncertainty are of minor or negligible 
importance for this part of the analysis. 

Even though MOT changes are related indirectly to average sea 
surface temperature (ASST) changes, which are in turn related to global 
average surface temperatures (GAST)—both important numbers for 
estimates of Earth system sensitivity®!”-!°—it is not straightforward to 
constrain the LGM-Holocene ASST or GAST change from the MOT 
change we derive here. The main deep-water masses such as Antarctic 
Bottom Water (AABW) and North Atlantic Deep Water (NADW)— 
which represent today about 55% of the global ocean volume—are ven- 
tilated and thermally equilibrated in high-latitude areas”®”! around 60°. 
Therefore, MOT is biased towards the polar regions in its representa- 
tion of ASST. Furthermore, multiple lines of evidence suggest that the 
glacial deep water circulation was fundamentally different from today’s, 
with a more stratified ocean and a larger AABW cell at the expense of 
the other water masses”*~”°. On the one hand, if one considers that 
surface temperature changes are amplified in higher latitudes compared 
to lower latitudes—a well known climate phenomenon known as polar 
amplification—one could argue that our LGM-Holocene MOT change 
represents an upper limit of average SST change. On the other hand, it 
is not clear by how much the changes in ocean circulation have affected 
the relevant areas for global ocean ventilation”). 

To explore these different aspects that link ASST and GAST to 
MOT, we evaluated oceanic and atmospheric temperature fields of 
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Figure 3 | Comparison of our best-estimate 


MOT change rate (10-3 °C yr-") 


Energy (1075 J) 


MOT record with other palaeoclimatic 
records for the last glacial transition. Labels as 
in Fig. 2. The grey bars mark the sections used 
to derive the LGM-Holocene MOT difference. 
a, MOT change rate and corresponding global 
ocean heat flux derived from Monte Carlo 
splining of our best-estimate MOT dataset 
with 600-yr cut-off frequency splines. The 
uncertainty band (dashed lines) represents the 
lo range of all realized Monte Carlo splines. 

b, The red lines are the splined version of our 
best-estimate MOT dataset (Fig. 2, red) using 
the same splining procedure as in a. Note 

that caution is required when interpreting 
excursions based on single data points, such as 
for example, around 20 kyr Bp (also applies 

to a). The light-blue lines are the energy 
anomaly in the total ocean relative to today 
expressed in the same type of spline as for the 
red curve. The left y axis is scaled such that 

the light-blue and red curves overlap as much 
as possible. The remaining small difference 
originates from the different effect of ocean 
volume change on the two parameters. Crosses 
indicate where the actual data points are located. 
The dark-blue lines are the sea-level anomaly 
record of ref. 14 transferred into the latent 


Ocean heat flux (W m-*) 


MOT (°C) 


Greenhouse gas forcing (W m~?) Antarctic temperature (°C) 


energy put into melting (grounded) ice to create 
the corresponding sea-level change (the LGM 
low corresponds to a sea level 134m below 
today’s). The splining procedure is the same as 
above, but with a cut-off frequency of 150 yr 
(because of the higher resolution of this record) 
and a 2o uncertainty band. The latent heat is 
derived by simple scaling of the sea-level data by 
3.45 x 10'* m? ocean volume change per metre 
of sea level'* and the latent heat coefficient for 
the ice-water transition (thermal expansion 
contribution (about 0.6 m between the LGM and 
the Holocene) can be neglected). c, Antarctic 
temperature reconstruction*”. d, 60° N and 60° S 
(roughly where deep waters are formed) mean 
annual insolation anomaly relative to today””, 
which is driven by changes in obliquity 

and is symmetric on both hemispheres. 

e, Greenhouse gas forcing*!. f, Reconstructed 
Earth surface temperatures with 1o uncertainty 
band of Northern Hemisphere (“‘NH; light blue), 


60° N/S 


Mean annual insolation (W m-~') 


Surface temperatures (°C) 


Aa [ one Southern Hemisphere (‘SH} dark blue), and 
_»— 10.05 global average (‘Global’, black)®. g, Atmospheric 
: Be a ~ Ser measured at the eee ice core”’. 
/\ Strong 0.06 3 , AMOC proxy ***Pa/**Th from ocean 
seal Jo fs y eacalli ¢ med eda r r sediment core OCE326-GGCS5 recalibrated 
h f Jk | \ 8 teal 5 with IntCal13°°*”. All data are plotted on 
wv, ? < L 0.08 their original age scale if not otherwise noted 
CS gS L (WD2014 for WAIS data*°). Note that the data 
‘ * | 0.09 shown in b-f are anomalies relative to today. 
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seven different global climate models (six are part of the Paleoclimate 
Modelling Intercomparison Project 3 (PMIP3)) that provided such 
output for LGM and preindustrial conditions (see Methods). All these 
independent state-of-the-art climate models have different but physi- 
cally consistent climatologies for the two climate states, for which rea- 
son the model ensemble spread is representative of the uncertainties 
of how MOT, ASST and GAST are linked. The model ensemble ranges 
of the scaling factor for AASST/AMOT and AGAST/AMOT are 0.7- 
0.9 and 2.0-2.9, respectively. The models generally underestimate the 


LGM-Holocene MOT difference (range 0.9°C to 2°C) relative to our 
results. Despite the uncertainties related to these scaling factors, they 
suggest that the LGM-Holocene GAST difference is between 5.1°C and 
7.5°C, which is roughly consistent with the estimates of refs 8 and 19, 
but not with the low values of ref. 6 and in particular of ref. 26. Note that 
most of these studies use PMIP climatologies to infer GAST as we do 
here, however, they use surface temperature proxies that are recording 
local climate and are affected by ocean biogeochemistry. Owing to the 
globally integrative and purely physics-driven nature of the MOT proxy 
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we present here it might be possible to better constrain such estimates 
in the future and narrow down some of the uncertainties related to the 
LGM GAST. 

It is interesting to note that since the LGM about the same amount 
of energy has gone into MOT as into melting grounded ice (Fig. 3b). 
This is not contradicting the understanding that most of the current 
anthropogenic warming has been taken up by the ocean even though 
only about 10cm of sea-level rise (about half of the total rise of 19cm 
since 1900) is attributed to melting of grounded ice”, whose latent heat 
equivalent is only about 3% of the total energy taken up by the ocean’. 
The response of melting land ice to global warming is very much 
dependent on the geometry/configuration/sensitivity of the global ice 
sheets at a specific point in time’’. Therefore, the 1:1 ratio of energy 
going into the ocean and melting grounded ice has to be regarded as an 
average over the whole last glacial transition and cannot be expected 
to hold for the anthropogenic warming. However, as a recent study has 
shown’’, including ice melting is important to close also the current 
global energy budget and can provide new insights into the mechanism 
behind recent decadal global temperature variabilities. 


Climate-MOT interplay 

There is no temporal uncertainty between the MOT and CH, records 
(Fig. 3g) because they were obtained from trapped air in the same ice 
core. Atmospheric CH, reacts quickly to changes in the northern and 
tropical regions (within decades) and has been measured with very high 
resolution and precision”. Therefore, it is an excellent time marker for 
the abrupt changes in Northern Hemisphere climate (dashed lines in 
Figs 2 and 3) related to variations in the Atlantic Meridional 
Overturning Circulation (AMOC), that separate the climate periods 
Heinrich Stadial 1 (HS1), the Antarctic Cold Reversal and the Younger 
Dryas from each other*°. This allows a precise comparison between 
MOT and the changing climate and ocean circulations that are associ- 
ated with the climate periods mentioned above (Fig. 3). 

First, the comparison of the inflection points of MOT and abrupt 
changes in the CH, record shows no lead or lag of MOT relative to 
these events (with the exception of the end of the Younger Dryas; see 
below). In particular for the transition from the HS1 to the Antarctic 
Cold Reversal, the temporal constraints are strong owing to the high 
resolution of both the MOT and the CH, records. For this event we 
estimate the MOT inflection point to occur at 14,780 + 390 yr BP. 
This is indistinguishable from the occurrence of the corresponding 
CH, change at 14,580 + 80 yr Bp. This constrains any possible phase 
shift between CH4/AMOC change and MOT to be within a couple of 
centuries, at least for this point in time. 

Second, the trends in the MOT record we present here are strikingly 
similar to those of Antarctic temperature (AAT) during the last glacial 
transition (Fig. 3). AAT and MOT show the same general evolution of 
stable temperatures during the LGM, followed by a moderate warming 
during HS1 (17,690-14,580 yr Bp), a cooling during the Antarctic Cold 
Reversal (14,580-12,750 yr BP), a strong warming during the Younger 
Dryas (12,750-11,550 yr Bp) before reaching stable Holocene values. 
In fact, the Younger Dryas MOT warming finished about 500 yr before 
the rapid CH, rise at 11,550 yr Bp that marks the end of the Younger 
Dryas. The end of the Younger Dryas is an anomaly to the otherwise 
close relationship of MOT and AAT during the last glacial transition. 

During the HS1 period, MOT changes at a rate of 0.67 0.11 mK yr“, 
which corresponds to an energy uptake by the ocean of 
(3.6 £0.52) x 107! J yr! (all errors given in this paragraph are 1). 
This is about 30% of what is estimated! for the ocean heat uptake 
between 1997 and 2015 ((12.4+5.0) x 10?! J yr~!). The Antarctic 
Cold Reversal period is characterized by a statistically significant 
cooling of the global ocean of —0.29+0.13 mK yr, which translates 
into an energy loss of (—1.4 £0.66) x 107! J yr~!. The warming from 
12,750yr BP to 12,050 yr Bp (referred to as YD1) within the Younger 
Dryas represents the strongest global ocean warming phase within 
our record. The MOT change rate is 2.5+0.53 mK yr7! and the 
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corresponding energy uptake (13.8 + 2.9) x 10?! J yr~!. This unprece- 
dented natural MOT warming rate is comparable to the strong warming 
since 1997 estimated in ref. 1, but clearly surpasses the estimate therein 
for the multidecadal trend from 1971 to 2005 (see below). The close 
relation between our MOT record and AAT/AMOC changes as well as 
the strong warming during the YD1 are two intriguing features of our 
record and are discussed here in more detail. 

The synchronicity of MOT and AAT during the last glacial tran- 
sition is somewhat surprising because AAT (and atmospheric CO?) 
seems to lead global averaged surface temperatures (GAST) by several 
centuries® (Fig. 3f). However, this is not a contradiction because the 
lag of GAST relative to AAT/CO) is explained by a lag of the Northern 
Hemisphere temperatures (N-GAST) while the Southern Hemisphere 
temperatures (S-GAST) are synchronous with (or even lead) AAT/CO>. 
MOT is a S-GAST-biased parameter owing to the larger volume of the 
ocean ventilated in the Southern Hemisphere”®”’, so the synchronicity 
of MOT and AAT/CO) is consistent with GAST lagging AAT/COs, as 
found in ref. 6. The general picture arising from this is that MOT, CO2 
and S-GAST are changing synchronously (within the given uncertain- 
ties) and N-GAST is lagging during the last glacial transition. With 
the glacial atmospheric CO) rise attributed to the release of CO, from 
the Southern Ocean*!, this suggests that (at least for this transition) 
the Southern Hemisphere climate was driving the global climate out 
of the glacial period and not the Northern Hemisphere. The similarity 
between AAT/AMOC and MOT could be explained such that only 
the waters ventilated at the high southern latitudes have a net effect 
on MOT. Through the well known AMOC-related meridional surface 
heat transport mechanism known as the bipolar seesaw’, the Southern 
Ocean surface temperatures increase when the AMOC is in a weak state 
and vice versa. These surface temperature changes may have reached 
the southern deep-water formation areas and subsequently changed 
the temperatures of the AABW, which comprises a large portion of the 
global ocean volume. Changes in other regions might not necessarily 
have a net effect on MOT. This simple explanation suggests that the 
current ocean heat uptake could indeed be underestimated or under- 
sampled given that AABW forms in the Southern Ocean and fills the 
bottom part of the ocean below 2,000 m, areas which are inadequately 
covered by observation systems such as the Argo floats’. 

However, this purely Southern-Ocean-driven explanation for the 
AMOC-MOT relation might be too simplistic. The basic behaviour 
of MOT increase during a weak AMOC and vice versa is seen in two 
model experiments'!”?, but it is explained by changes in the low-lat- 
itudinal ocean. The change in AMOC affects the heat capacity of the 
low-latitudinal Atlantic, which leads to accumulation of heat in this 
region after a switch from a strong to weak AMOC (such as from LGM 
to HS1) and a release of heat in the opposite case (such as from HS1 to 
the Bolling—Allerod period)**. This mechanism produces very similar 
MOT patterns and rates of change in the experiments of ref. 33 to what 
we find for the HS1 and Bglling—Allerad periods, providing some sup- 
port for this underlying mechanism. However, this mechanism is not 
sufficient to explain the MOT pattern and rates of change during the 
Younger Dryas, where we find a much stronger warming in the first 
phase (about 700 yr), followed by temperature stabilization. In fact, this 
pattern is more comparable to what ref. 11 simulate in their AMOC 
disturbance experiments, though the magnitude of change in these 
experiments is quite different. In summary, the relationship between 
AMOC strength and MOT is a consistent feature in the few model 
studies that investigate the tie between these parameters, but neither 
study replicates the temporal pattern or magnitude of MOT change 
observed in this record. 

So far we have looked into the ways that changes in AMOC could 
affect MOT. The causality, however, may be flipped: MOT may affect 
the AMOC. As shown in ref. 34, changes in Southern Ocean surface 
heat flux can affect the stability of the AMOC-. If southern heat fluxes 
are high, the AMOC is stronger, and vice versa, because a warmer/ 
colder Southern Ocean is associated with a warmer/colder AABW, 
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which reduces/increases the density differences between NADW and 
AABW and, hence, increases the pull/push onto the AMOC. In fact, 
the two causal relations mentioned here (effect of the AMOC onto 
MOT and vice versa) could provide a feedback loop that explains the 
fluctuations of the AMOC characteristic of the glacial periods”*: during 
a weak AMOC state, the Southern Ocean/AABW warms*’, which 
decreases the density differences between NADW and AABW, con- 
tinuously increasing the ‘pull’ onto the AMOC. Once the ‘pull’ becomes 
too large, the AMOC switches to its strong state, which in turn starts 
cooling AABW, making it again harder for the AMOC to sustain its 
strength as AABW becomes denser again. In other words, the bipolar 
seesaw and the teleconnection between Southern Ocean and AMOC 
together would make up a density oscillator which could—depending 
on the background ocean temperatures or stratification**—be self- 
sustaining and not necessarily triggered by a North Atlantic surface 
perturbation, often thought to be the cause behind the glacial 
AMOC fluctuations. This density oscillator is probably not only 
temperature-driven but also involves salinity changes. As outlined in 
ref. 22, Southern Ocean temperatures also affect the sea ice extent and 
the associated effect of brine rejection on the salinity/density of the 
Southern Ocean waters potentially exceeds the temperature effect on 
AABW density by up to a factor of five. The idea described here needs 
thorough testing with ocean models, and does not explain, for example, 
the abruptness of the AMOC changes that are characteristic to these 
AMOC changes in glacial times. However, it provides an alternative to 
the otherwise North-Atlantic-focused explanations for these oscilla- 
tions and is in line with the MOT record presented here. 


Younger Dryas warming 

The strong YD1 MOT warming is a striking element of our record and 
represents a clear anomaly to the otherwise strong link between MOT, 
AAT and AMOC, respectively. The event starts at the same time as the 
corresponding warming events seen in the AAT and GAST records, 
but MOT shows a clearly higher warming rate and reaches its Holocene 
level considerably earlier. The correction of our data for the firn frac- 
tionation processes is critical, but neither do the stable isotope data used 
to derive this correction show any inconstancy nor does the uncertainty 
in the thermal correction have enough leverage to explain this event 
(see Methods). 

There is an unexpected change in the accumulation rate in the WAIS 
Divide ice core from 12,000 yr BP to 11,600 yr BpP*», which could cause 
weakly understood dynamic firn fractionation processes, but this event 
had no effect on the YD1 part of the noble gas record because the air 
was already trapped in the ice before the accumulation event started 
(the uncertainty” in gas age versus ice age is only +50 yr). Therefore, 
the YD1 noble gas changes found here seem to be truly atmospheric. 
We cannot exclude the possibility that the ocean circulation pattern 
has shifted rapidly from its potential glacial state” to its modern state 
during the YD1, which could cause a dampening of the YD1 MOT 
change by up to 0.35°C (the sum of the Kr and Xe saturation state and 
the AABW volume biases; see Methods) because we currently assume 
a gradual change. There is no evidence that such a change happened 
specifically at this point in time, for which reason we continue with the 
gradual change assumption. Nevertheless, this 22% leverage with which 
to dampen the YD1 MOT event still leaves the YD1 as an extreme event 
in terms of MOT warming. 

The YD1 phase is associated with a strong ocean heat uptake of 
1.140.23 W m~”? (1a), but the greenhouse gas forcing is basically 
stable, the orbital forcing change is negligible, the sea-level record does 
not indicate any major losses of land ice or albedo" (Fig. 3b), and other 
processes tend rather to a slight negative radiative forcing*”. This sug- 
gests that the YD1 MOT warming is driven by ocean dynamics rather 
than by radiative forcing changes. The drainage of Lake Agassiz prob- 
ably drove the AMOC changes during the Younger Dryas*’; however, 
AMOC-disturbance experiments using intermediate complexity 
climate models either do not reproduce the high MOT warming rate 
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of YD1 (1.6°C in about 700 yr)*?, or fail to sustain this high rate over 
the observed period!’. This suggests that AMOC changes can explain 
only part of the YD1 MOT warming. In experiments using state-of- 
the-art global climate models forced by anthropogenic greenhouse 
gas emissions’, none of the 15 models (individually averaged over all 
realizations) reaches the warming rate of YD1 averaged over 1971-2005 
(35 yr). The mean rate over all models is about a third of the YD1 
warming rate, even though the greenhouse-gas radiative forcing is at 
least ten times stronger than during YD1°*. In summary, this shows 
that the YD1 MOT warming is challenging the current understanding 
of global ocean temperature regulation and suggests that either current 
climate models generally underestimate the ability of the ocean to take 
up heat, or that climate conditions/drivers during the YD1 have been 
substantially different from the model experiments mentioned here in 
a way that allows much stronger heat uptake. Two ideas about possible 
conditions/drivers behind the YD1 warming are further discussed 
in Methods and are related to the strong insulation in high latitudes 
during YD1 (see Fig. 3d) and an isolated water mass combined with a 
drastic change in the global ocean overturning circulation, respectively. 

In summary, the MOT reconstruction for the last glacial transition 
we present here constrains MOT with unprecedented accuracy from a 
novel proxy based on noble gases in the atmosphere. The record pro- 
vides unique insights into the energy budget of the currently largest 
energy buffer in the climate system—the ocean—and its interplay with 
changing climate and ocean circulation. The insights we gain here raise 
questions about how the ocean regulates its temperature under vari- 
able conditions—a topic very important for future climate change— 
but have not yet been studied extensively owing to a lack of long-term 
reconstructions. We describe here the general features of the data and 
possible explanations for them, but further work is needed using global 
climate models to test our hypotheses. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 


Deriving noble gas elemental and isotope ratios from ice cores. The analyti- 
cal method we used to analyse the trapped air in the ice samples is described in 
ref. 10. Briefly, about 800 g of ice are melted in an evacuated vacuum vessel and the 
released air is cryo-trapped in a dip tube cooled with liquid helium. In a second 
step, the air is split into two subsamples and from one of them all non-noble gases 
are removed via a Zr/Al getter. Then, each of these two subsamples is analysed sep- 
arately on a specific dual-inlet isotope ratio mass spectrometer. The two machines 
provide high-precision deviations (usually expressed in -notation) from a stand- 
ard, which is in our case the current atmospheric composition. Specifically, the two 
machines provide the following main isotope ratios (mass ratios): 6!°N (??N2/"*N>), 
Ar (1Ar/*°Ar) and Kr (®°Kr/*Kr); as well as the following main elemental 
ratios: SAr/N> (*°Ar/?8N3), SKr/Ar (Kr/*°Ar) and 6Xe/Ar (°*Xe/*Ar). The ele- 
mental ratios of SKr/N> (**Kr/8N>), 5Xe/N> (!5*Xe/?8N>) and 6Xe/Kr (!92Xe/**Kr), 
which are used for the MOT reconstruction, are derived by combining the machine 
elemental ratios accordingly. The isotope ratios are used to correct for gravitational 
and thermal fractionation in the firn column as described in Methods subsection 
‘Inferring atmospheric noble gas ratios from the raw data. 

The dataset presented here was obtained over the course of three measurement 

campaigns in 2014 and 2015. The first campaign applied method 1 described in 
ref. 10 during which 21 samples of the WAIS Divide ice core were analysed. The 
results of two of the samples were fully or partly rejected owing to measurement 
artefacts or artefacts occurring in the bubble-to-clathrate-transition zone (BCTZ) 
of ice cores (see Methods subsection ‘Sample rejection and the data gap from 
4,000-7,500 yr ago’) below). The second and third campaigns applied method 
2 of ref. 10, in which 42 and 15 samples, respectively, from the same core were 
analysed. Six samples of the second campaign were partly or fully rejected for the 
same types of reasons as mentioned above; two rejections were required in the 
third campaign samples. 
Sample rejection and the data gap from 4,000-7,500 yr ago. 10 out of the 78 
samples we measured for this study are subject to sample rejections. For 3 of them, 
however, the entire set of data did not have to be rejected (partial rejections). 
Partial rejections can occur when a measurement error occurs after the sample 
splitting!°, thus affecting only the corresponding dataset. Another possibility is that 
a minor error only affects the parameters that are most sensitive to it: for example, 
a thermal gradient during the splitting process will affect Ar/N» the most because 
of its strong thermal diffusion sensitivity® relative to the precision obtained’. 
Depending on the amplitude of such an error, some parameters might appear 
as outliers, while others do not. It is therefore important to check all parameters 
thoroughly and individually and put them into the context (if possible) of the whole 
record, as done for the example of the BCTZ in ref. 10. For the first case (affecting 
one subsample), we have two such cases where the primary heavy noble gas data 
was lost owing to a failure of the corresponding mass spectrometer. For the second 
case of single parameter outliers, we rejected the data including Xe, but kept the 
remaining parameters. These affected samples could be replaced by measuring a 
neighbouring sample. 

The full rejections affect 7 samples, of which one is related to operational errors 
during the measurement procedure and another one to a contaminant in the sam- 
ple. These two samples could also be replaced by measuring a neighbouring sample. 
The remaining 5 of these full rejections are related to gas fractionation in the 
BCTZ, which creates a data gap’? in our record from about 4,000-7,500 yr BP 
that can only be filled with measurements from another core. In the BCTZ, gases 
are fractionated due to gas-loss and fractionation processes between the bubbles 
and clathrates occurring in this zone**“°. We identified this zone primarily by 
inconsistencies or outliers in 6Ar/N> with respect to 5°Ar (#°Ar/*°Ar)as seen in 
ref. 44, but we also looked for inconsistencies in all other observed isotope and 
element ratios’. The BCTZ is also known as the brittle ice zone*® because of the 
very brittle behaviour of the ice core and is often reported as such by the drilling 
team. However, the way we observe the BCTZ through the gas measurements does 
not necessarily line up with the observation via the core quality or the appearance/ 
disappearance of clathrates and bubbles in the ice. The reason is that at the upper 
end of the BCTZ some fractionation has to build up in order to obtain noticeable 
effects from the gas diffusion processes in the extracted air, and at the lower end, 
the gas fractionation can ‘tail’ into the fully clathrated ice zone*®. Hence, we expect 
the alterations in the gas record due to the BCTZ which we observe via the gas 
measurements to be shifted downwards in depth compared to the zone defined by 
the core quality and inclusion observations; however, it is not clear to what extent. 
It was a goal of this study and of ref. 10 to identify the BCTZ-affected zone for the 
parameters we obtained. 

The top end of the BCTZ-affected zone was found between 922 m and 1,120m 
depth and the bottom end was found between 1,510m and 1,572 m depth”, while 
the core quality and inclusion observations find the BCTZ or ‘brittle ice zone’ at 
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520-1,310m depth*”**, This large shift of several hundred meters is surprising 
and has not been observed so far in other gas records; however, it is specific for 
the ice core and the gases we observe here and could also vary between different 
methods for the same gas species. Nevertheless, it is interesting to note that we 
find gas fractionation effects of the BCTZ to affect our data in a depth interval 
that is considerably deeper and slightly narrower than what the ice observations 
suggest. 

A further quality control was done by comparing the reconstructed atmospheric 
880 (340,/720,) values with the record of ref. 49. However, it turned out that this 
control is not very sensitive and did not uncover more outliers than those already 
identified with the parameters mentioned above. Nevertheless, it is important to 
check all these parameters to ensure the consistency of the great wealth of data the 
method provides, because many elements of this complicated method can alter 
the measurement”. The high quality of the record (outside the BCTZ) is probably 
attributable to careful core handling and processing under cold conditions (the 
ice-processing tent in the field was actively cooled to —25°C)* and our subsamples 
were kept in a —50°C freezer whenever possible to prevent outgassing”!. 
Potential biases in MOT from noble gases in ice core samples. Concentration 
or ratio changes in the most prominent gases in the atmosphere (CO; and O2) 
are the result of a combination of complex biogeochemical processes reacting 
or adapting to changing climate****. Therefore, these well studied gases con- 
tain an intrinsic delay and low-pass filtering behaviour with respect to climate 
change that are dependent on the inertia of the underlying mechanisms. In 
contrast, the noble gases analysed in this study are not subject to any biogeo- 
chemical process and their atmospheric changes are dependent only on their 
physical transportation in the atmosphere-ocean system. For our application here 
the relevant physical transportation processes are (1) the exchange between ocean 
and atmosphere, (2) the mixing within the atmosphere and (3) the transport from 
the atmosphere into the ice. We discuss these three elements in detail to show 
that they do not create a temporal modulation of the observed noble gases with 
respect to MOT. 

All the heat fluxes in and out of the ocean take place at the ocean—atmosphere 
interface. There is no internal heat source in the ocean, and geothermal heating (the 
most potent heat source for the ocean besides the atmosphere/surface) is negligible 
compared to the forcing at the surface!!. Hence, if the noble gas transport across 
the ocean—-atmosphere interface is following the equilibrium solubility function as 
assumed here, for each joule going in or out of the ocean a corresponding number 
of noble gas molecules gets released from or dissolved in the ocean, respectively. 
Internal mixing of water masses with different temperatures mixes joules and noble 
gases in the same way. Although this would lead to local solubility disequilibrium 
in these mixed waters owing to the nonlinearity in the solubility functions, it does 
not affect the measured atmospheric composition, because this process takes place 
inside the ocean. 

The assumption of gas equilibrium is justified because the gas transfer velocity 
between surface ocean and atmosphere of the observed gases lies in the range 
13-16cm h™! (3°C water temperature, 10m s~! wind speed)’, which translates 
into an equilibration timescale for these gases of one to two months with a mixed 
layer of 200 m thickness as found in polar regions (shorter equilibration in other 
regions). This is short enough to capture the strong seasonality in the hemispheric 
ocean heat fluxes as evidenced by atmospheric measurements” of Ar/N), and is 
also much shorter than the residence time of water parcels in the mixed layer, 
in particular in the Southern Ocean, where gas equilibration is most critical*. 
There is a slight disequilibrium of noble gases in the deep ocean!™!%, but this 
does not affect the relatively fast equilibration timescales of the surface ocean. 
However, it has implications for the absolute scale of our proxy, as discussed below 
in Methods subsection ‘Box model to infer MOT”. For these reasons, the ocean- 
atmosphere gas exchange does not create any delay or low-pass filtering behaviour 
of atmospheric noble gases with respect to climate change/ocean temperature 
changes in our record. This is also supported by the model simulation of ref. 11, 
which includes physical gas exchange processes and ocean circulations in a three- 
dimensional model. The ocean circulation perturbation experiments done in 
this study do not show any temporal modulation between the modelled ocean 
temperature and atmospheric noble gases. 

Mixing within the atmosphere also takes place on timescales of months to a 
year, for which reason the studied gas mixing ratios probably contain geospatial 
differences on seasonal timescales** comparable to Ar/N». However, these seasonal 
variations are smoothed in the trapped air in ice core samples because of the low- 
pass filtering of the stagnant firn air column through which atmospheric signals 
have to be transported before they are trapped in the ice*°. The filter time char- 
acteristic for the WAIS Divide ice core varies”” between 20 yr and 50 yr, meaning 
that the trapped air in the ice is an average value over these time periods. This filter 
characteristic is in fact exceptionally low for Antarctic ice core standards and is a 
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result of the high accumulation rate at the site, for which reason the WAIS Divide 
ice core provides excellent temporal resolution capabilities for trapped gas in the 
ice. The firn filtering timescale is much lower than our maximum sampling rate 
of about 110 yr and is also substantially below the 600-yr cut-off frequency that 
we apply in the data splining. For all these reasons the noble gas records presented 
here contain no intrinsic temporal dampening element such as is known to occur 
in other atmospheric gas records and are (within the given uncertainties and the 
current understanding) a direct representation of MOT. There are, however, pro- 
cesses that can alter the scaling between noble gases and MOT; these are discussed 
and quantified in Methods subsections ‘Inferring atmospheric noble gas ratios 
from the raw data and “Box model to infer MOT’. 

There does exist a scenario under which our noble gas data would be blind 

to MOT changes: if there were a large portion of the ocean that exchanges heat 
with the atmosphere without exchanging gases. The corresponding water masses 
would be characterized by disequilibrium between temperature and dissolved 
noble gases, with the same magnitude of disequilibrium for all noble gases. Today, 
such waters seem not to exist because all deeper ocean water masses found so 
far contain an amount of noble gases corresponding to their temperature!>'® 
(with a tendency to noble gas undersaturation, however, caused by fast cooling 
and not of the same magnitude for all noble gases; see also Methods subsection 
‘Inferring atmospheric noble gas ratios from the raw data’). The glacial ocean 
circulation pattern suggested in ref. 22 could have favoured the production of 
such ‘blind’ water masses during the LGM; however, it is important to note that 
our data would only be affected if these water masses were completely isolated 
from the atmosphere while exchanging heat before sinking into the deep ocean 
(conceivable if there were a gas-impermeable sea ice layer through which heat 
could be conducted, so that the waters underneath would sink into the deep ocean 
without any more atmospheric contact). If the sea ice were only partly or slowly 
permeable for noble gases or the waters had only a very short exposure time 
with the atmosphere (expected if polynyas (areas of open sea surrounded by ice) 
were as important for deep-water formation as they are today’’), the ‘blindness’ 
would no longer exist. As soon as a slight exchange of gases occurred, Kr would 
come closer to equilibrium than Xe because of the faster equilibration time of Kr 
(similar concept as behind the fast-cooling effect'*). Under such a situation our 
data would show a discrepancy between the MOT signal in 5Xe/N); relative to 
SKr/N> (because we assume constant equilibration over time; see also Methods 
subsection ‘Inferring atmospheric noble gas ratios from the raw data) and, hence, 
be indicative of such a process (which is not the case). This scenario of 100% 
decoupling for a large portion of the ocean is conceivable under a Snowball 
Earth scenario, but seems very unrealistic and hypothetical for the LGM situa- 
tion, because there is no indication that deep waters would form in such a way. 
However, further studies with state-of-the-art climate models are needed to rule 
out these unrealistic but not-yet-excludable effects. Note that if the LGM ocean 
had had such a ‘blind’ water mass, the transition from ‘blind’ to ‘not blind’ would 
have needed to happen immediately because an ‘in betweer’ state should appear 
as a phase of discrepancy between MOT values from 5Xe/N; and 6Kr/N> (which 
is not the case). 
Inferring atmospheric noble gas ratios from the raw data. The heavy noble gas 
ratios we obtain from the ice core samples are highly fractionated with respect to 
the atmospheric value, mainly owing to gravitational fractionation in the static 
firn air column at the top of an ice sheet, below which the air is trapped in the ice. 
The depth of this firn column changes over time and is influenced by the local 
snow accumulation rate and temperature, among other things>*. The effective 
firn air depth at a specific point in time can be ‘measured’ by analysing stable gas 
isotope ratios of N> (6°N), Ar (5“°Ar), Kr (5°°Kr) and Xe (8!°Xe). By combining 
these ratios it is also possible to resolve the minor thermal and kinetic fraction- 
ation processes that might have occurred®’. The conditions required for kinetic 
fractionation to occur—as described in ref. 59 (very low accumulation rate, low 
temperature)—do not apply to the WAIS Divide ice core drill site!°°, for which 
reason this effect is not considered in our calculations and we consider only grav- 
itational and thermal fractionation. With the method used in this study we obtain 
the atmospherically stable ratios of !°N (?°N2/78N)), 6¢°Ar and 6°°Kr (8Kr/*Kr) 
with a precision that enables us to resolve the thermal and gravitational fraction- 
ation processes adequately’®. 

In theory—knowing all the air fractionation processes occurring in the firn 
column—the differences between the measured isotope ratios can be used to 
reconstruct the thermal fractionation component using the well known thermal 
diffusivity parameters””. Since we have three isotope ratio pairs but only one 
fractionation effect that should affect these values, the system is over-determined 
and we can check whether it is consistent for all possible combinations. However, 
any combination including &*°Kr to determine the thermal component results in 
a temperature difference of 1.5°C to 2°C between the top and bottom of the firn 
column (referred to as the ‘firn thermal gradient’) for the LGM and Holocene 


periods, which is unrealistic because of the stable surface temperatures during these 
periods!’. About the same constant offset is found during the transition period 
compared to the modelled firn thermal gradients of ref. 36. If 5'°N and 8“°Ar is 
used, the thermal component is in rough agreement with the expectations through 
the whole record. We have thoroughly tested our method for possible analytical 
artefacts that could fractionate or contaminate §°°Kr, without success. Also, if there 
were such an artefact, we would have corrected for it to a large extent given that we 
reference our ice sample measurements to modern air samples, which are measured 
on the concept of identical treatment”. 

To circumvent 6*°Kr in a first step, we use an independent scenario of firn 
thermal gradient based on ref. 36. After applying this scenario to the data we follow 
the approach of ref. 9 and use 6*°Ar to obtain the gravitational correction compo- 
nent for all other elements. 5“°Ar has the smallest analytical uncertainty per mass 
unit—1.5 per meg (that is, 1.5 x 0.001%o) on average—and hence, provides the 
highest possible accuracy for this largest, but well defined, correction factor. The 
isotope data that are corrected using this approach (Extended Data Fig. 1) show 
clearly that 5°°Kr is depleted relative to 5!°N and 6*°Ar (referred to as the ‘Kr 
anomaly’), which is the reason why the firn thermal gradients based on 58°Kr 
mentioned above turn out wrongly. We believe this Kr anomaly is a true signal 
in the trapped ice, probably caused by a firn fractionation mechanism that is 
yet unknown. Further investigations from other sites are needed for a better under- 
standing of the mechanism behind it. 

The Kr anomaly seems mainly to consist of a fairly constant offset relative to the 
other isotope of —56 per meg without any obvious trends and changes over time 
(Extended Data Fig. 1). This indicates that the underlying mechanism is fairly sta- 
ble over time, for which reason we correct the 6**Kr raw data by this average offset. 
If we use the corrected 6**Kr values and compare the firn thermal gradients based 
on the different isotope pairs again, the results are now consistent with each other 
(the gradients involving 6°°Kr do now provide realistic and comparable values, as 
do the values based on 8!°N and §"°Ar for the whole record period). 

Therefore we derived a second scenario for firn thermal gradients based on 
the measured isotopes (including the corrected 5*°Kr) by averaging the gradients 
derived from the three possible isotope pairs (see Extended Data Fig. 1b). This 
data-based scenario is independent of the first model-based scenario of ref. 36, 
and together the scenarios represent the uncertainty range associated with the 
thermal-correction component for our study. We account for this uncertainty range 
in our final MOT record by combining the 3,000 Monte Carlo MOT realizations 
of each scenario and propagate this uncertainty element into our final record (see 
more details in Methods subsection ‘Box model to infer MOT’). In general, the 
uncertainty associated with this thermal correction is comparable to the one orig- 
inating from the analytical uncertainties. The analytical uncertainties translate 
into about 0.2°C uncertainty in MOT (see Methods subsection ‘Potential biases 
in noble gases from ice samples as a proxy for MOT”) whereas the effect of the two 
scenarios on our MOT estimate is within about 0.25 °C (corresponding to a 1°C 
firn thermal gradient difference between the scenarios). 

We cannot exclude the possibility that the underlying mechanism of the Kr 
anomaly also affects to some extent the gas ratios we use to reconstruct MOT (5Kr/ 
No, 5Xe/N>, SXe/Kr). As seen in Extended Data Fig. 2, the reconstructed atmos- 
pheric noble gas ratios are depleted during the Holocene period, which translates 
into an average Holocene MOT of —0.36°C below present values, as seen in our 
MOT record in the main text (Fig. 2). This Holocene MOT ‘offset’ is more than the 
observed ocean warming since the industrialization! and, hence, would suggest 
that there was substantial MOT warming already before industrialization. This 
‘offset, however, could also be an artefact because the mechanism behind the Kr 
anomaly might also deplete 5Kr/N>, SXe/N> and 5Xe/Kr. Since the Kr anomaly seen 
in Extended Data Fig. 1 is fairly constant over time, the effect on 5Kr/N>, 6Xe/N2 
and 6Xe/Kr is also expected to be constant over time, for which reason we argue 
that the mechanism behind the Kr anomaly produces—if any at all—a constant bias 
to our MOT record of perhaps —0.36°C, but does not change the relative changes 
within our record. Therefore, relative changes, such as the Holocene-LGM MOT 
difference or the MOT trends of the different periods, are not affected by this 
potential bias and represent the effective changes in MOT. However, the readers 
have to be careful in interpreting the absolute values we derive from our records, 
because of the potential bias described here. Nevertheless, we do not apply any 
offset correction to our MOT record, as we do not feel confident to do so at present. 

Despite the fact that the conditions at the WAIS Divide site do not fit the con- 
ditions required for kinetic fractionation as described in ref. 59, we tested this 
model and interpreted the Kr anomaly as caused by kinetic fractionation and used 
the model to scale the anomaly to the elemental ratios. With this approach, the 
resulting MOT records for the Late Holocene are found to be warmer than today by 
about 0.25 °C and not consistent with each other for the LGM period. Accordingly, 
the mechanism behind our gas fractionation must be somewhat different to kinetic 
fractionation. 
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One way to look at the Kr anomaly is that the heavier Kr—and therefore also 
slower diffusing gas in the firn air column—deviates from the lighter Nz and Ar 
isotopes towards a smaller gravitational enrichment. This could be related to the 
relatively fast transformation of the WAIS Divide firn air column, which could 
lead to disequilibrium in the firn air such that the slow diffusing gases would 
not be able to ‘catch up’ with the fast downward advection of the ice matrix. This 
effect would be stronger the more slowly the gases diffuse through the air, which 
is (to first order) related to the weight of the molecule; hence N> and Ar would be 
less affected than the heavier gases like Kr and Xe. By using the isotopes of a light 
molecule to correct for gravity (6°Ar in our case) the gravitational component of 
the heavier molecules might be overestimated. This would be consistent with the 
depletion in the reconstructed atmospheric 6°°Kr (Extended Data Fig. 1a), and 
potentially also 8Kr/N>, 6Xe/N> and 6Xe/Kr. If this were the case, however, we 
would expect an even stronger ‘anomaly’ for Xe isotopes (5!°?Xe (13?Xe/!?°Xe)) 
than for Kr isotopes (by about a factor of two, based on the diffusivity in air/ 
total mass). For the data obtained in the last campaign (see Methods subsection 
‘Deriving noble gas elemental and isotope ratios from ice cores’) we changed the 
mass spectrometer method so that we were able to obtain 5!°*Xe (not shown here), 
though with much worse precision” than for 6°°Kr. The data indicates no anomaly 
for §!°Xe, which is not what we expected, but the data are sparse and further work 
is needed to rule this out. 

That a Kr anomaly (or &°°Krexcess) is indicative for disequilibrium effects in the 

firn air column is shown by the firn air transport modelling study of ref. 61. The 
model, however, currently lacks experimental support, for which reason further 
firn air studies at different sites with different firn transformation characteristics 
are needed. For our purposes, such work would also need to include the effects on 
the heavy noble gases (isotopes and mixing ratios), in particular 'Kr/N2, 5Xe/N2 
and 6Xe/Kr. This has the potential to strongly reduce the current uncertainty of 
our MOT data, both on the absolute and relative scale. 
Box model to infer MOT. To derive MOT from the heavy noble gas data, a box 
model is used as described in ref. 10. The basic assumption in the model is that N>, 
Kr and Xe are conserved in the ocean-atmosphere system and that these gases are 
in solubility equilibrium between the two reservoirs. Hence, any change in ocean 
temperature changes the well defined equilibrium state of the noble gases. Since 
the solubilities of the individual gases are not equally sensitive to water temperature 
changes, the ocean temperature change leads to a change in atmospheric mixing 
ratio, which can be observed with ice cores. Here, the model is used backwards, 
using the measured atmospheric ratios as input and deriving the corresponding 
MOT by iteration. We use the same solubility functions as used in ref. 15 (which 
uses the solubility function of ref. 62 for No, of ref. 63 for Kr, and of ref. 64 for Xe) 
with the same 2% correction for the original Xe solubility function. 

The first rough validation of this simple box model comes from the work of 
ref. 9, which showed an agreement of MOT derived from their noble gas ratios 
measurements in ice cores with the MOT independently derived from ocean 
sediment core proxies. Furthermore, the simple box model has been tested against 
a climate model with intermediate complexity’! and also showed a negligible 
difference between the two models despite the large complexity difference. The 
same study also confirmed that the only non-surface heat source for the ocean— 
geothermal heating—is too small to affect the noble gas-MOT relation noticeably. 
However, in ref. 11 a sea ice gas-exchange effect is also implemented, which 
resulted in different noble-gas-to-MOT relationships from those of the 
non-sea-ice case. From the new noble gas data of this study, we can now conclude 
that their sea-ice effect is overestimated, as the corresponding 5Xe/N) scaling 
would suggest an unrealistically low MOT for LGM of at least 4°C below today 
(our —4%o value for the LGM is no longer covered by their results). 

Owing to the much higher quality of the noble gas data presented in this study, 
smaller effects not considered in ref. 9 can become relevant. Therefore, we imple- 
mented and tested different model elements to assess all possible sources of uncer- 
tainties within our box model. An overview of the different elements is shown 
in Extended Data Table 1, including the corresponding effects onto the LGM- 
Holocene MOT difference. The effects were derived by successively implementing 
the elements from top to bottom of the table. 

The most minimalist model consists only of one ocean and one atmosphere 
box and uses only the measured noble gas ratios (6Kr/N>, 'Xe/N> or 6Xe/Kr) to 
infer MOT. This model setting suggests the LGM MOT to be roughly —2.0°C 
colder than the Holocene, which seems too low compared to the —2.5°C to —3.5°C 
suggested by sediment core proxies and model studies*>”!!, Nevertheless, we can 
assess the uncertainties of our MOT estimate within this minimalist model. The 
only source of uncertainty here is the analytical uncertainty which we propagate 
into the total MOT uncertainty using 3,000 Monte-Carlo simulations (3,000 
realizations of MOT values while changing the noble gas ratios within their 
analytical uncertainties). The corresponding MOT uncertainty is on average 
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+0.26°C for 6Kr/N3, £0.15 °C for 'Xe/N and +0.17°C for 5Xe/Kr, respectively, 
comparable to what is reported in ref. 10 for the individual methods. 

As ref. 9 has already pointed out, sea-level change has an important effect on 
the noble gas distribution in the ocean—-atmosphere system owing to the associated 
changes in ocean volume, ocean salinity and sea surface pressure. Salinity and 
sea surface pressure affects the solubility equilibrium state and the ocean volume 
defines the total storage capacity of the ocean. Here we use the sea-level change 
record from ref. 14 to derive these elements. Implementing the sea-level change 
effects increases the LGM-Holocene difference by 0.5°C, with the largest contri- 
bution by the volume effect and the other two effects roughly compensating each 
other (see Extended Data Table 1). The uncertainty of the sea-level change record is 
also propagated into our total MOT uncertainty estimate; however, its contribution 
is below 10% of that of the analytical uncertainty. 

The two elements included so far correspond to what has been implemented in 
the previous works already. We now investigate further elements that potentially 
have a considerable effect on our MOT reconstruction. The colder glacial climate 
is known to be drier than the interglacial/modern climate because of its lower water 
content. A lower water content also means a lower total mass of the atmosphere 
and hence, a lower average sea surface pressure. We estimate this effect using the 
current atmospheric H2O concentration of about 2.5% (ref. 65) and a Clausius— 
Clapeyron relation of atmospheric H2O concentration and temperature®™ while 
taking our MOT differences relative to today to be the effective surface tempera- 
ture change. This approach might slightly underestimate the effective change in 
H,O concentration/sea surface pressure because the average surface temperature 
change might have been slightly larger**’; however, considering the small effect 
on the MOT reconstruction and the uncertainties related to such global surface 
temperature estimates, this approach is justified. For the sake of completeness, we 
implemented a linear change of this effect from the LGM to the beginning of the 
Holocene in our final MOT record. 

The majority of the ocean volume gets its temperature and noble gas imprint, 
respectively, in the high latitudes around Antarctica where the largest portion of the 
deep water is formed”’. In these regions the average sea surface pressure is slightly 
lower by about 3% compared to the average ocean surface®*. We therefore assume a 
time-independent offset of the effective sea surface pressure by 3% to calculate the 
solubility equilibrium state in our box model. This slightly reduces the noble gas 
amounts dissolved in the ocean and causes the noble gas ratios to be less sensitive 
to MOT changes. Hence, this effect requires a slightly lower LGM temperature of 
—0.05 °C to compensate for the reduced sensitivity. Regional sea surface pressure 
changes between glacial and interglacial climate are simulated to be in the range 
of a few hectopascals®’, which is one to two orders of magnitude smaller than 
the global sea surface pressure effect from changing sea levels. Therefore we can 
assume this pressure bias to be time-independent. 

As shown in refs 15 and 16, deep waters today are slightly undersaturated in 
Kr and Xe with respect to the water temperature. This phenomenon is explained 
by the strong cooling rate these waters experience before they sink into the deep 
ocean, preventing the noble gases from fully equilibrating with the waters before 
they sink!®. The observed undersaturation is roughly 2% for Xe and 1.3% for Kr, 
respectively. Owing to the large differences expected in the glacial deep-water cir- 
culations compared to today”, it is possible that this undersaturation pattern was 
different for glacial periods. As the general overturning of the deep circulation is 
expected to have been slower, it is likely that the cooling rate was smaller in glacial 
times and, hence, the undersaturation smaller. The most extreme case—where 
noble gases were in full equilibrium in glacial times—leads to unrealistically large 
discrepancies between the MOT derived from the different ratios. The change of 
undersaturation that keeps MOT differences roughly within the allowed uncer- 
tainty range is 50% (meaning that Xe undersaturation at LGM could have been 1% 
and Kr accordingly). This causes the LGM temperature derived from the different 
ratios to be up to 0.4°C warmer as with a constant undersaturation (Xe/Kr being 
most sensitive followed by Xe/Np, and almost no effect for Kr/N2). Since the effec- 
tive change in undersaturation is unknown, we calculate MOT realizations for the 
case with constant undersaturation at all times and a 50% change (linear) over 
the course of the LGM-Holocene transition (17,900-11,550 yr Bp) and combine 
the two scenarios for our best-estimate record. This leads to a slight shift of the 
average MOT towards warmer temperatures and an increase in the uncertainty 
range for the earlier part of the record (see also LGM-Holocene MOT change 
estimate below). 

AABW and NADW—which together represent more than half of the global 
ocean volume today and probably occupied even more in glacial times**—have 
different characteristics with regard to temperature and salinity (see also Fig. 1). 
Using only one ocean box in our model as done so far implies that the global 
temperature distribution in the ocean was the same as today and that all water 


masses changed their temperature equally. However, AABW is —0.88°C today”” 
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and its cooling potential is only about 1.2°C before it reaches the freezing point 
of seawater (—2°C), which is not enough to fulfil the constraints on MOT during 
the LGM from different lines of evidence (the noble gas record provided here, 
and refs 4 and 9). Just from this simple consideration it is obvious that the LGM 
ocean temperature pattern must have been different from today’s. To account for 
this aspect we split the ocean box into three boxes representing AABW, NADW 
and all other waters (RES). We set the temperature, volume and salinity of AABW 
and NADW according to ref. 20 (AABW: —0.88 °C, 35% of total ocean volume, 
34.641 PSS; NADW: 2.3 °C, 20% of total volume, 34.886 PSS) and set the RES ocean 
such that the averaged ocean corresponds to today’s average conditions (3.53 °C, 
1.34 x 10!8 m3, 34.72 PSS)”°. Ina first experiment we change the temperatures of 
the different volumes equally as long as AABW does not reach —2°C. If this hap- 
pens AABW temperature is set to —2°C (non-freezing) and the remainder of the 
cooling is compensated by the other water masses to equal shares. This requires a 
lower LGM MOT of —0.2°C owing to the nonlinearity of the solubility functions 
and gives a sense of how strong the effect of a changing temperature distribution 
can be on our MOT reconstruction. 

The non-freezing AABW experiment described above follows a somewhat 
artificial path of the ocean temperature/volume distribution. A more realis- 
tic scenario is that AABW volume was larger in glacial times, similar to what 
ref, 22 describes. We use a scenario in which AABW during LGM was 40% 
bigger than it is today and shrank linearly over the course of the LGM-Holocene 
transition (17,900-11,550 yr Bp) to the current situation found in ref. 20. We 
choose 40% because it roughly compensates the reduced AABW cooling/warm- 
ing potential with its change in volume at the expense of the other (warmer) 
water masses. This more realistic (but still arbitrary) scenario halves the effect of 
a change in the temperature distribution on the LGM-Holocene MOT difference 
to —0.1°C. 

We use this three-ocean box model version including all elements and the 
AABW volume change scenario described so far for our MOT reconstructions 
shown in the main text. The analytical uncertainties and uncertainties of the sea- 
level change record are propagated to our MOT estimate, creating 3,000 Monte- 
Carlo MOT realizations for each data point. The same procedure is done using 
the two firn thermal gradient scenarios and undersaturation scenarios described 
earlier. This results in 12,000 MOT record realizations for each ratio and 36,000 
MOT record realizations in total. Our best-estimate record is derived based on all 
these realizations, which provides an objective representation of all uncertainty 
elements discussed here. For our LGM-Holocene MOT change estimate (see 
averaging periods in Fig. 3) we also make use of all these realizations while we 
interpret the propagated measurement and the sea-level change uncertainties as of 
stochastic nature and treat them as normally distributed uncertainties. However, 
the uncertainty introduced by the Xe (and Kr) undersaturation effect we treat as 
non-stochastic because it represents equally likely scenarios. This source of uncer- 
tainty represents the largest contribution to the overall uncertainty and with this 
approach we find a LGM-Holocene MOT difference of 2.57 + 0.24°C. 

In Extended Data Table 1 we list three more elements that are not included in 
our MOT records, but are discussed here for completeness. As described in ref. 22, 
the glacial ocean circulation might have been characterized by an approximately 
1PSS saltier AABW cell owing to missing fresh water input from melting sea ice 
in the Southern Ocean. As the salt content can be assumed to be conserved in the 
ocean on these timescales, the additional salt in AABW has to be provided by 
NADW and RES. Owing to the salinity dependency of the solubility functions, such 
a salinity redistribution leads to different weights of the differently warm water 
masses in the MOT reconstruction. We tested this effect by a salinity anomaly of 
1PSS applied to our AABW cell (compensated by NADW and RES by equal shares) 
and find a small effect of only —0.02°C on the LGM MOT estimate. 

Another aspect we test is the potential bias caused by a large floating ice shelf. 
Noble gases are basically only dissolving in the liquid phase of the ocean but the 
sea-level change record does not capture the corresponding liquid ocean volume 
change as opposed to ice that is stored on land. We assume an ice shelf with the 
extent of the modern winter sea ice around Antarctica and a thickness of 200 m. 
This seems gigantic, as we do not have any evidence that such a large ice shelf could 
have existed. The effect of such an ice shelf on the LGM MOT estimate would only 
be —0.1°C and shows that this potential bias is also of minor relevance. 

The last row in Extended Data Table 1 shows the effect of the applied 2% cor- 
rection of the Xe solubility function compared to the case in which we do not apply 
this correction. Mass conservation of the noble gases in the model means that this 
temperature-independent change in the solubility function of Xe leads to a slight 
change in the MOT sensitivity of the ratios, including Xe (Xe/N> and Xe/Kr). The 
effect on the LGM MOT estimate, however, would only be 0.04°C and 0.07 °C, 
respectively, showing that the results presented here are not much affected by this 
existing uncertainty in the Xe solubility. Kr is about a factor of two less soluble in 
sea water than Xe and the solubility function of Kr is better constrained’ than is 


Xe. For these reasons, the effect on the LGM MOT estimate of the uncertainty in 
the Kr solubility function is much smaller than what is shown for Xe in Extended 
Data Table 1 and can therefore be neglected. 

Scaling MOT to surface temperatures based on global climate models. MOTs 
are set by surface ocean temperatures, which in turn are related to global surface 
temperatures. The connection between surface and ocean interior temperature 
changes is, however, also dependent on the climatology (polar amplification, ocean 
circulations, location of deep water formation areas, and so on), which is different 
for glacial and interglacial periods. The constraints on the glacial climatology are 
fairly weak and the realization of such climatology within a climate model can be 
very different from model to model. Therefore, we use several independent climate 
models that provide climatology for glacial and interglacial conditions and calcu- 
late the scaling factors from MOT to ASST and GAST changes, respectively (see 
AASST/AMOT and AGAST/AMOT in Extended Data Table 2). 

Such glacial-interglacial climate model experiments are part of the Paleoclimate 
Modelling Intercomparison Project (PMIP), which can be accessed openly via 
one of the Coupled Model Intercomparison Project (CMIP) data nodes. All 
results found in Extended Data Table 2 are based on model output from the 
PMIP3 project (ensemble: rlilp1; see ref. 71 for more details about the CMIP5/ 
PMIP3 experiments), with the exception of the Bern3D model results which were 
provided for this study. From the PMIP3 project results, we used the following 
variables from the LGM and the Pre-industrial Control experiments: (1) global 
averaged sea water potential temperature (thetaoga), (2) seawater potential 
temperature (thetao), and (3) near surface air temperature (tas). Where available, 
we averaged the thetaoga data to derive MOT. If only thetao was available (three- 
dimensional field) we averaged over the time dimension covered by the corre- 
sponding dataset (12 months) and then over the space dimension while weighting 
the cell values by the corresponding cell volumes. ASST was calculated by first 
filtering all surface cells in thetao that are covered by more than 50% with sea ice, 
followed by the same temporal and spatial averaging as done for MOT. Therefore, 
our ASST values represent the open ocean surface temperatures excluding the areas 
covered by sea ice, where the heat exchange with the atmosphere is negligible and 
the surface ocean temperature is set to freezing temperature of the corresponding 
water (dependent on salinity). GAST was calculated by averaging the tas fields 
(two-dimensional fields). 

The results in Extended Data Table 2 show that the LGM-Holocene MOT dif- 
ference varies strongly from model to model mainly owing to discrepancies in the 
LGM values. This shows that the models provide quite different climatologies in 
particular for the LGM conditions. Therefore the range of these model results can 
be interpreted according to how much different climatologies can affect the scaling 
factor between the globally averaged parameters calculated here. The AASST/ 
AMOT scaling factor varies from 0.67 to 0.89 with an ensemble average of 0.80. 
The AGAST/AMOT scaling factor varies from 1.96 to 2.92 with an ensemble 
average of 2.50. 

In general, the models underestimate the MOT difference between the LGM 

and the Holocene with an ensemble average of 1.60°C and a range from 0.92°C to 
1.95°C, which raises the question of whether the large spread of the scaling factors 
is correlated to the absolute LGM-Holocene MOT difference and, hence, may con- 
tain a bias. However, there is no correlation between the absolute LGM-Holocene 
MOT difference and the scaling factors, for which reason any possible bias in these 
scaling factors is believed to lie within the model spread. 
Hypothesis behind the Younger Dryas MOT anomaly. As discussed in the main 
text, our MOT record shows a phase of outstanding strong and fast warming dur- 
ing the first half of the Younger Dryas (referred to as YD1). Here we discuss two 
possible underlying mechanisms. 

One condition that might underlie the strong MOT warming/heat uptake 
during YD1 could be the strong insolation in high latitudes associated with the 
phase of high obliquity around YD1 (Fig. 3). In the latitudes where deep waters 
are formed, the local annual averaged heat flux was about 1.5 W m higher than 
during the LGM. The additional heat flux could have led to an increased warm- 
ing of surface waters near the deep-water formation areas during the summer 
seasons, which would have then been transported into the deep ocean during 
the winter seasons, when deep-water formation mainly occurs. The pattern of 
the YD1 warming, however, is not consistent with the gradual insolation change, 
requiring additional processes at work. For the period before the YD1 warming 
and its abrupt start, the change in AMOC state can provide such an explanation: 
before the YD1 the strong AMOC state pulls the warm waters towards the north, 
preventing warming of the deep (southerly ventilated) ocean. The collapse/weak- 
ening of the AMOC at the beginning of the YD1 stopped this northward heat pull 
and, thus, triggers the rapid YD1 warming. But for the end of the YD1 warming, 
which occurs considerably before the end of the Younger Dryas when the AMOC 
accelerates again, the AMOC can no longer explain the observation. Note that these 
orbital-driven heat flux changes are fairly small with regard to the baseline flux of 
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about 234 W m ~* (today). Hence, they might have been only of minor importance 
for the YD1 MOT anomaly. 

Another hypothesis that could explain the MOT pattern during the Younger 
Dryas is that a cold, isolated water mass was ventilated during YD1. This water 
mass would have last been ventilated several millennia earlier, for example during 
the cold LGM, and only the push of the Younger Dryas onset (collapse of AMOC*’) 
would have brought this cold water up to the surface to equilibrate. The end of YD1 
would then mark the point in time when this water mass was fully ventilated and 
hence this scenario would be able to provide an explanation for the stalled warming 
before the AMOC acceleration. Such a drastic change in ocean ventilation could 
be explained with a switch from a glacial ocean circulation mode to a modern/ 
interglacial mode as mentioned in the main text. Multiple lines of evidences suggest 
the existence of such different ocean circulation modes”*~°, and in the case of the 
shift from interglacial to glacial mode, the ‘MIS 5-4 transition at around 70 kyr BP 
has been suggested as such™*”°. The YD1 could be the counterpart of the MIS 5-4 
transition, providing a relatively sharp definition of the last glacial period from an 
ocean circulation perspective. 

Data availability. All relevant data from the ice samples (noble gas elemental and 
isotope ratios) are provided as Supplementary Data; the corresponding recon- 
structed mean ocean temperatures are provided as Source Data for Figs 2 and 3 
and Extended Data Figs 1 and 2. 

Code availability. The ocean box model, including the Monte Carlo code (Matlab), 
is available ‘as is’ from the corresponding author on request. Details of the ocean 
box model can also be found in refs 10 and 11. 
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Extended Data Figure 1 | Elements related to the gravitational and 
thermal correction applied to the ice core data. a, Residual of the isotope 
data after correction for gravitational enrichment in the firn based on 
5*°Ar (orange) and modelled firn thermal gradients (b, green**). In 
contrast to 6'°N (black), 6°*Kr (purple) clearly deviates from the zero line 
by —56 per meg on average, showing that our correction factors for 6°°Kr 
are over-estimated (5*°Ar is zero by definition because we use this data for 


the correction). Error bars represent the 1o analytical uncertainty of our 
method based on repeated measurements of modern air samples”. b, The 
two independent WAIS Divide ice core site firn thermal gradient scenarios 
used in this study. The blue trace represents the scenario derived from our 
isotope data for 6'°N, 5“°Ar and 6°°Kr, while first we corrected 6°°Kr by the 
offset seen in a. The green trace represents the model-based scenario and 
originates from ref. 36. 
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Extended Data Figure 2 | Raw atmospheric noble gas elemental based on our isotope data (see Extended Data Fig. 1) to correct for thermal 
ratios and relative differences between individual MOT records. fractionation. The error bars are lo. b, Differences in MOT derived from 
a, Reconstructed atmospheric elemental ratios (orange, 6Kr/N3; red, each of the three individual gas ratios relative to the best-estimate (Mix) 
5Xe/N>; purple, Xe/Kr) using &°Ar to correct for gravitational data (compare with Fig. 1; orange, Kr/N2 versus Mix; red, Xe/N> versus 
enrichment in the firn, and using the firn thermal gradient scenario Mix; purple, Xe/Kr versus Mix). 
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Extended Data Table 1 | Effects of box-model elements on the LGM-Holocene MOT difference 
iedical Tielman LGM values relative to Element specific effect on LGM- 
Holocene Holocene MOT difference 
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Sea-level change (SLC) effects are most important, but other effects are also listed. SSP, sea surface pressure. 
*These elements are not considered in our MOT record (see Methods). 
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Extended Data Table 2 | Simulated ocean and surface temperatures 


PiC, Pre-industrial Control. MOT and ASST are calculated by averaging the potential temperature fields in time and space of the corresponding experiments (see main text), while for ASST the 
sea-ice-covered area was excluded. GAST is calculated by similar averaging of the corresponding air temperature fields. The values denoted with # and * mark the highest and lowest value of the 
corresponding row, respectively. The ‘Ensemble Mean’ column shows the average of the seven models Bern3D, CNRM-C5, CCSM4, FGOALS, MIROC, MPI and MRI summarized in this table. 

The AGAST/AMOT and AASST/AMOT scaling factors of the FGOALS model are rejected because the former would suggest an unrealistically cold GAST for the LGM of 11°C below today’s and because 
both values are outliers with respect to the corresponding values of the other models. Detailed information about the individual models and the output data we used can be found on any publicly 
accessible data server node (such as https://esgf-data.dkrz.de) of the CMIP project. 
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It has been hypothesized that a condensed nervous system with a medial ventral nerve cord is an ancestral character 
of Bilateria. The presence of similar dorsoventral molecular patterns along the nerve cords of vertebrates, flies, and 
an annelid has been interpreted as support for this scenario. Whether these similarities are generally found across 
the diversity of bilaterian neuroanatomies is unclear, and thus the evolutionary history of the nervous system is still 
contentious. Here we study representatives of Xenacoelomorpha, Rotifera, Nemertea, Brachiopoda, and Annelida to assess 
the conservation of the dorsoventral nerve cord patterning. None of the studied species show a conserved dorsoventral 
molecular regionalization of their nerve cords, not even the annelid Owenia fusiformis, whose trunk neuroanatomy 
parallels that of vertebrates and flies. Our findings restrict the use of molecular patterns to explain nervous system 
evolution, and suggest that the similarities in dorsoventral patterning and trunk neuroanatomies evolved independently 


in Bilateria. 


The nervous systems of Bilateria, in particular their trunk neuro- 
anatomies, are morphologically diverse! (Fig. 1a). Groups such as 
arthropods, annelids, and chordates exhibit a medially condensed 
nerve cord, which is ventral in arthropods and annelids, and dorsal in 
chordates. By contrast, other lineages have multiple paired longitudinal 
nerve cords distributed at different dorsoventral levels. There are even 
bilaterians with only weakly condensed basiepidermal nerve nets, 
similar to those in cnidarians (Fig. la), which supports the idea that 
this net-like neural arrangement predates the Cnidaria—Bilateria split” 
(Fig. 1a). However, the earliest configuration of the bilaterian central 
nervous system (CNS) is still debated”*” (Fig. 1a), and thus it is unclear 
when and how often nerve cords evolved in Bilateria. 

The conserved deployment of signalling molecules and transcrip- 
tion factors along the bilaterian anteroposterior and dorsoventral axes 
grounds most scenarios for the evolution of the CNS?47-!2. In particular, 
the similar expression of the transcription factors nkx2.1/nkx2.2, 
nkx6, pax6, pax3/7, and msx in the ventral neuroectoderm of the fly 
Drosophila melanogaster and the annelid Platynereis dumerilii, and 
the dorsal neural plate of vertebrates (Fig. 1b), is a core argument for 
proposing an ancestral CNS comprising a medial ventral nerve cord 
(VNC) in Bilateria®*”!”°, In P dumerilii and vertebrates, and to some 
extent in Drosophila, the staggered expression of these genes correlates 
with the spatial location of neuronal cell types along their trunks*””. 
Serotonergic neurons form in the ventromedial nkx2.2*/nkx6* 
region, cholinergic motor neurons develop in the nkx6*/pax6* area, 
and dbx* interneurons and lateral sensory trunk neurons differentiate 
in the more dorsolateral pax6*/pax3/7* and pax3/7+/msx* domains, 
respectively (Fig. 1b). The dorsoventral arrangement of these transcrip- 
tion factors and neuronal cell types is absent in hemichordates!01}4, 
nematodes!*!®, and planarians’’, consistent with the idea that the most 
recent ancestor of Bilateria had a dorsoventrally patterned, medially 
condensed VNC that has been repeatedly lost in these and perhaps 
other groups'”. However, there is an alternative explanation: that a CNS 
with a single nerve cord and the similar dorsoventral patterning is the 


trait that repeatedly evolved, and thus was absent in the most recent 
common bilaterian ancestor®®10", 


Neuroectodermal patterning in Xenacoelomorpha 

To explore the conservation of neuroectodermal patterning systems in 
Bilateria, we first studied Xenacoelomorpha (Extended Data Fig. 1), 
which is the sister group to all remaining bilaterian lineages'*!? (that 
is, Nephrozoa). We focused our analyses on Xenoturbella bocki, the 
nemertodermatids Meara stichopi and Nemertoderma westbladi, and 
the acoel Isodiametra pulchra. As in the acoel Hofstenia miamia”® 
and most other bilaterians”!°, these xenacoelomorphs differentially 
express anteroposterior marker genes along their primary body axis”>”” 
(Extended Data Figs 2a, c and 3). The bone morphogenetic protein 
(BMP) pathway, which has an ancestral dorsoventral patterning role”®? 
and an anti-neural role in Drosophila and vertebrates”, is also similarly 
deployed in all studied xenacoelomorphs”, with bmp ligands expressed 
dorsally and antagonists located more ventrolaterally (Fig. 2a, d and 
Extended Data Figs 2d and 4). However, the dorsoventral transcription 
factors that we found in our genomic resources (Supplementary Table 1) 
did not show a clear staggered expression (Fig. 2b, e). Therefore, 
Xenacoelomorpha only exhibits the anteroposterior and BMP 
ectodermal patterning systems, which is reminiscent of the cnidarian 
condition”, 

Importantly, ectodermal patterning systems are deployed inde- 
pendently of the trunk neuroanatomy in Xenacoelomorpha. Similar 
to cnidarians, xenacoelomorphs have a uniformly distributed, diffuse 
basiepidermal nerve net*”°-?”. Xenoturbella species only have this 
network?°. However, nemertodermatids have additional longitudinal 
basiepidermal nerve cords”°, located dorsally in M. stichopi?® (Fig. 2c), 
and ventrally in N. westbladi (Extended Data Fig. 2e). The acoel 
I. pulchra also has four pairs of subepidermal nerve cords distributed 
along the dorsoventral axis’’ (Fig. 2f). Genes commonly involved 
in neurogenesis (Extended Data Fig. 5a, d) and neural transmission 
(Extended Data Figs 2b, f and 5b, c, e) are consistently expressed in the 


1Sars International Centre for Marine Molecular Biology, University of Bergen, Tharmohlensgate 55, 5006 Bergen, Norway. *Natural History Museum of Denmark, Biosystematics Section, 
Universitetsparken 15, DK-2100 Copenhagen, Denmark. *Naturhistoriska Riksmuseet, PO Box 50007, SE-104 05 Stockholm, Sweden. 
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Figure 1 | CNS evolution and dorsoventral patterning. a, A nerve net 

is ancestral for Cnidaria and Bilateria. The neuroanatomical diversity 
hampers the reconstruction of the CNS evolution in Bilateria. b, A central 
argument for an ancestral medially condensed VNC for Bilateria is the 


sensory structures and neural condensations in these species. However, 
the dorsoventral transcription factor nkx6 does not co-localize with the 
motor neuron marker ChAT in the trunk of M. stichopi and I. pulchra, 
and the relation of pax6* cells to this and another motor neuron 
marker (Hb9) is unclear in both species (Fig. 2b, e). Therefore, the 
diversity of neuroanatomies of Xenacoelomorpha contrasts with the 
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similar deployment of dorsoventral transcription factors in vertebrates, 
Drosophila, and the P. dumerilii larva. The staggered expression of these 
genes concurs with specific neuronal populations. D, dorsal; V, ventral; 
A, anterior; P, posterior; 5-HT, serotonin; ACh, acetylcholine. 


more conserved deployment of ectodermal anteroposterior and BMP 
patterning systems. This, and the observation that disruption of BMP 
signalling does not affect CNS development (Extended Data Fig. 6), 
support the idea that the anti-neural role of the BMP pathway evolved 
after the Xenacoelomorpha-—Nephrozoa split. Likewise, the expression 
of dorsoventral transcription factors unrelated to the distinct trunk 


Figure 2 | Dorsoventral patterning in Xenacoelomorpha. a, The 

bmp ligands and admp are expressed dorsally; chd is expressed 
ventroposteriorly. b, Transcription factors nkx2.1, nkx6, and msx are 
expressed ventrally; paxé is expressed broadly; Hb9 and ChAT are in the 
nerve cords. c, M. stichopi CNS (green arrowheads indicate the anterior 
commissures; red arrowheads indicate the nerve cords). Tyr-tubulin, 
tyrosinated tubulin. d, The bmp ligands are expressed dorsally; admp-a 
is expressed posteroventrally; admp-b is expressed anterolaterally. e, The 
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nkx2.1 paralogues and nkx2.2 are expressed ventrally; nkx6 is expressed 
laterally; pax6 throughout the body; msx in isolated cells; Hb9 and 

ChAT are in the brain. f, I. pulchra CNS (green arrowheads indicate the 
brain; red arrowheads indicate the nerve cords). Insets are lateral views. 
Abbreviations: ac, anterior commissure; bnn, basiepidermal nerve net; 
dnc, dorsal nerve cord; Inc, lateral nerve cord; np, neuropile; pc, posterior 
commissure; st, statocyst; vlnc, ventrolateral nerve cord; vnc, ventral nerve 
cord. Scale bars, 100 1m. 
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Figure 3 | Dorsoventral patterning in 
Brachiopoda. A, Transcription factors nkx2.2 
and nkx6 are in the trunk midline (arrowheads), 
posterior tip (arrows), gut, and apical cells (nkx6); 
pax3/7 is expressed laterally (arrowheads) and in 
the apical lobe (arrow); msx is in the mantle and 
ventral pedicle. B, There is an nkx2.2+/nkx6* 
medioventral region, and a more lateral 
nkx6*/pax6*/pax3/7* anterior trunk domain. 

C, T: transversa larval CNS (green arrowheads 
indicate the neuropile in a, and the trunk 

wn serotonergic condensation in b; red arrowheads 
mark the VNCs; pink arrowheads indicate 

the innervation of the chaetae). D, Only tph is 
expressed in the trunk (arrows and arrowheads 
indicate expression areas). E, Transcription factors 


neuroanatomies suggests that the dorsoventral patterning of the nerve 
cords also evolved after the Xenacoelomorpha—Nephrozoa split. 


Dorsoventral patterning in Brachiopoda 
To investigate the conservation of the dorsoventral nerve cord pat- 
terning in Nephrozoa, we focused on Spiralia”’, one of the three major 
nephrozoan clades. Although some lineages have a medially condensed 
VNC (that is, Annelida), a main pair of lateral VNCs is widespread 
and probably homologous in Spiralia®. We first studied the brachiopod 
Terebratalia transversa, in which we identified staggered expression of 
dorsoventral transcription factors in the anterior ventral midline of 
the larval trunk. At this stage, nkx2.1 (ref. 30) and pax6 (ref. 31) are 
expressed in the apical lobe, albeit pax6 expression projects slightly into 
the mantle lobe. However, there is a medial nkx2.2+/nkx6* domain, a 
more lateral nkx6‘/pax6*/pax3/7* region, and a broad, dorsolateral 
msx* area in the anterior ventral ectoderm of the larval ‘trunk’ (that 
is, mantle and pedicle lobes) (Fig. 3A, B and Extended Data Fig. 7a). 
Additionally, a narrow line of cells below the apical-mantle boundary 
crossing the ventral midline expresses pax3/7 (Fig. 3A, B and Extended 
Data Fig. 7a). These expression domains disappear in the highly modi- 
fied adult body (Extended Data Fig. 7a—c). The staggered expression of 
dorsoventral transcription factors in the ventral anterior ectoderm of 
the trunk only partly correlates with the larval neuroanatomy, which 
consists of an anterior condensation and a medial accumulation of sero- 
tonergic cells on the ventral side, from which pairs of neurites innervate 
the chaetae and posterior end (Fig. 3C). The dorsoventral transcription 
factors do not co-express with most neuronal markers!?, which are 
mostly expressed in the anterior region (Fig. 3A, D and Extended Data 
Fig. 7a, d). Only two tph* clusters in the medial serotonergic conden- 
sation of the larval trunk co-localize with the nkx2.2+/nkx6* medial 
domain. Therefore, the brachiopod T. transversa resembles vertebrates, 
arthropods, and P. dumerilii in the presence of a ventral serotonergic 
nkx2.2*/nkx6*+ area*®!232, as well as in the nkx6, pax, pax3/7,and msx 
dorsolateral domains, which are, however, not apparently connected to 
any neural trunk structure. 

The staggered ectodermal expression of dorsoventral transcription 
factors in the anteroventral trunk of T: transversa is largely conserved 
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nkx2.2 and nkx6 are in the trunk ventral midline 
(arrowheads), apical lobe (arrows), and gut; 
pax3/7 is in the mesoderm (arrows), and in two 
ventrolateral trunk domains (arrowheads); msx 
is in the trunk, shell epithelium (arrowhead), and 
mesoderm (arrows). F, N. anomala larval CNS 
(green arrowhead indicates the neuropile; red 
arrowheads in a mark the VNCs; red arrowheads 
in b indicate the innervation of the chaetae). 
Abbreviations: ao, apical organ; bp, blastopore; 
ch, chaetae; mo, mouth; np, neuropile; vnc, ventral 
nerve cord. Scale bars, 501m. 
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Figure 4 | Dorsoventral patterning in Nemertea. a, Transcription 
factors nkx2.1 and nkx2.2 are in the head (arrows), proboscis (nkx2. 1), 
and trunk cells (nkx2.2); nkx6 and pax6 are in the head (arrows) and 
VNCs (arrowheads); pax3/7 is broadly expressed. Neuronal markers are 
in the brain (arrows) and VNCs (arrowheads). b, In the VNCs, nkx2.2* 
cells express tph, but not nkx6; nkx6* cells express pax3/7 and Hb9, but 
not VAcHT. c, L. ruber CNS (green arrowheads indicate the brain; red 
arrowheads mark the VNCs and the dorsal neurite in the upper inset). 
Abbreviations: br, brain; dnc, dorsal nerve cord; mo, mouth; tr, trunk; 
pb, proboscis; vnc, ventral nerve cord. Scale bar, 100 1m. 
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Figure 5 | Dorsoventral patterning in Rotifera and Annelida. 

a, Transcription factor nkx2.1 paralogues, nkx2.2, and pax6 show brain 
domains (arrowheads); nkx6 is detected posteriorly (arrowheads). 

b, E. senta CNS (green arrowhead indicates the brain; red arrowheads 
indicate the VNCs, and additional neurites). c, Transcription factor 
nkx2.2 shows gut expression (arrowhead); nkx6 is in the ventral midline 
(arrowheads); pax6 is in two lateral larval bands (arrowheads), and 
juvenile head; pax3/7 is in two ventrolateral larval clusters and midline 


in the brachiopod Novocrania anomala. In this brachiopod, nkx2. 1 
(ref. 30) and paxé6 (ref. 31) are expressed in the apical lobe, and 
nkx2.2 and nkx6 are expressed medially in the trunk (Fig. 3E). As in 
T. transversa, nkx6 extends more laterally at the anterior trunk, where it 
co-localizes with pax3/7 in the early larva, and msx is broadly detected 
in the trunk (Fig. 3E and Extended Data Fig. 7e). Therefore, N. anomala 
also has a medial ventral nkx2.2*/nkx6* domain; remarkably, however, 
this domain does not co-localize with any serotonergic condensa- 
tion, which is lacking in the larval CNS of this brachiopod (Fig. 3F). 
Therefore, the conserved staggered expression of the dorsoventral tran- 
scription factors in the anteroventral larval trunk is not necessarily 
connected to the CNS, suggesting that this system may rather pattern 
only the ectoderm in Brachiopoda. 


Dorsoventral patterning in Nemertea 

Similar to brachiopods, some dorsoventral transcription factors show 
staggered expression along the trunk ventral side of the nemertean 
Lineus ruber. In this worm, dorsoventral transcription factors are 
first detected in the larval imaginal discs (Extended Data Fig. 8a). 
In metamorphic and definitive juveniles, nkx2.1 is expressed in the 
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(arrowheads), but in two trunk clusters in juveniles (arrowheads); 

msx paralogues are in ventral larval domains and the juvenile VNC 
(arrowheads). d, O. fusiformis CNS (green arrowheads indicate the apical 
larval FMRF-amide cell and juvenile brain; red arrowheads indicate the 
larval anterior axon and juvenile medial VNC). Abbreviations: ao, apical 
organ; br, brain; cb, ciliary band; cg, caudal ganglion; ch, chaetae; In, lateral 
neurites; mo, mouth; ms, mastax; np, neuropile; vg, vesicle ganglia; 

vnc, ventral nerve cord. Scale bars, 50 um. 


head and proboscis, and pax3/7 is broadly expressed (Fig. 4a and 
Extended Data Fig. 8a). However, nkx2.2, nkx6, and pax6 are detected 
in isolated ventrolateral cells, as well as in cephalic domains (nkx2.2, 
nkx6, pax6) and isolated trunk cells (nkx2.2) (Fig. 4a and Extended 
Data Fig. 8a). Remarkably, nkx2.2 and nkx6 do not co-localize, but 
nkx6 and paxé6 do (Fig. 4b). These staggered domains relate to the 
disposition of the VNCs of L. ruber (Fig. 4c). Furthermore, nkx2.2* 
cells co-express the serotonergic marker tph, and nkx6™ cells express 
the motor neuron marker Hb9, but not VAchT (Fig. 4a, b). Therefore, 
the staggered expression of the dorsoventral transcription factors 
nkx2.2, nkx6, and paxé are linked to the ventral trunk CNS and some 
neuronal cell type markers in L. ruber, which is similar to the situation 
described in vertebrates and P. dumerilii*?!?*?. 


Dorsoventral patterning in Rotifera 

To explore the conservation of the dorsoventral patterning in Spiralia, 
we studied the rotifer Epiphanes senta, a member of the sister lineage 
to all remaining Spiralia”’. Different from the brachiopod larvae and 
the nemertean juvenile, E. senta juveniles lack a staggered expression 
of dorsoventral transcription factors along their trunks. The three 
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Figure 6 | Dorsoventral patterning and CNS evolution. a, Gene 
expression summary. Dotted lines indicate that the expression does not 
extend along the entire trunk. b, Proposed scenario for the evolution of 
neuroectodermal patterning systems in Bilateria. A nerve net, and the 
anteroposterior (AP; OA, oral-aboral) and BMP axial patterning predate 
the Cnidaria—Bilateria split, and were present in the Bilateria ancestor. 


nkx2.1 paralogues, nkx2.2, and pax6 are all in distinct brain domains 
of the juvenile rotifer (Fig. 5a). Only the gene nkx6 is detected in two 
posterior trunk cells (Fig. 5a). As in brachiopods and nemerteans, the 
trunk CNS comprises two VNCs, and additional paired dorsolateral 
nerves (Fig. 5b). The trunk expression of nkx6 probably corresponds 
to the vesicle ganglia’, but it is not related to motor neurons, as inferred 
by the expression of Hb9 and ChAT (Extended Data Fig. 9a). Therefore, 
spiralians with paired VNCs deploy the dorsoventral transcription fac- 
tors without a consistent association with their trunk neuroanatomies. 


Dorsoventral patterning in Annelida 

To investigate the conservation of the dorsoventral patterning in 
Annelida, the only spiralian lineage with a medially condensed VNC!*, 
we studied the annelid O. fusiformis, which belongs to the sister lineage 
to all remaining annelids*’. Remarkably, this annelid deploys the 
dorsoventral transcription factors differently from P dumerilii'?**. 
Besides the gut-related expression of nkx2.1 (ref. 30), nkx2.2, and 
nkx6 in embryos and larvae, the ventral ectodermal midline expresses 
nkx6, pax3/7, and two msx paralogues (Fig. 5c and Extended Data 
Fig. 9b). Additionally, pax6 and pax3/7 show more lateral larval expres- 
sion domains (Fig. 5c). However, the ventral ectoderm of the juvenile 
only expresses nkx6 and msx-b (Fig. 5c and Extended Data Fig. 9c). As 
in most other annelids!, the adult CNS includes a VNC in O. fusiformis, 
which is not yet present in the early larva* (Fig. 5d). In the juvenile, 
only the expression of nkx6 and msx-b relates to the location of sero- 
tonin (Fig. 5d) and motor neuronal markers (Extended Data Fig. 9d). 
Therefore, the dorsoventral patterning system also varies among anne- 
lids with a homologous condensed VNC, and between larval’? and 
adult stages*# (Extended Data Fig. 10a). 


Discussion 
Our study provides compelling evidence that the genes involved in 
the dorsoventral patterning of vertebrate, Drosophila, and P. dumerilii 


O Nemertea PA 2 


O Brachiopoda ex 
@ Annelida Nigel 


The ancestral nephrozoan neuroanatomy remains unclear (question 
mark). The dorsoventral (DV) patterning system is not tied to the CNS 
arrangement in Bilateria (as in Chordata and Annelida). In red, lineages 
analysed in this study. The green circle with red border indicates that there 
are annelids with and without the dorsoventral patterning. 


nerve cords do not show a similar staggered expression in the nerve 
cords of xenacoelomorphs and many spiralian lineages (Fig. 6a and 
Extended Data Fig. 10a, b). Although dorsoventral transcription factors 
define ectodermal domains in the larval brachiopod trunks and the 
nemertean juvenile (Fig. 6a), these do not necessarily correlate with 
the trunk CNS and the location of neuronal markers (Fig. 6a). Indeed, 
the cell lineage relationships between the early ectodermal expres- 
sion domains and specific neuronal cell types**’” are unclear, even in 
Drosophila®**, and still need to be broadly and functionally tested. Our 
findings demonstrate that the expression of dorsoventral transcription 
factors not only differs between species with multiple nerve cords but 
also between spiralians that share a medially condensed homologous 
VNC. A similar case is observed among chordates, where the cephalo- 
chordate** and tunicate*’ neural plates only partly show the vertebrate 
molecular arrangement (Extended Data Fig. 10b and Supplementary 
Table 2), which is probably not a secondary loss given the absence 
of the dorsoventral patterning in Hemichordata!”!!. Therefore, the 
expression of dorsoventral transcription factors evolved independently 
from the trunk neuroanatomy at least in certain bilaterian lineages, 
which restricts the use of this patterning system to homologize CNS 
anatomies*”'? and neuronal cell types™. 

The similarities in the expression of anteroposterior and BMP 
patterning systems in Cnidaria and Bilateria”*?** suggest that these 
mechanisms predate the Cnidaria—Bilateria split (Fig. 6b). However, 
these systems are deployed in organisms within these clades with 
diffuse nerve nets and/or centralized nervous systems, which indicates 
that their ancient role was probably general body plan regionalization’, 
and not CNS patterning and neurogenesis”. This also limits their use to 
homologize CNS anatomies. However, the evolution of the dorsoventral 
patterning of the nerve cords is more complicated (Extended Data 
Fig. 10c). If the similarities in dorsoventral CNS patterning between 
vertebrates, flies, and P dumerilii are homologous and thus reflect the 
ancestral bilaterian (or nephrozoan) state*”!*°, then this patterning 
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system was independently lost/modified many times. The differences 
between vertebrates and Drosophila in the upstream modulators of 
dorsoventral transcription factors and in their functional integration*”*® 
should thus be regarded as a case of developmental system drift®® over 
large phylogenetic distances. Alternatively, and more parsimoniously, 
these differences may indicate that the commonalities in dorsoventral 
nerve cord organization between vertebrates, arthropods, and some 
annelids evolved convergently (Fig. 6b and Extended Data Fig. 10c). 
The similar staggered expression domains of dorsoventral transcription 
factors in these three lineages, together with those uncovered by our 
study (Figs 3 and 4), might reflect the existence of ancient ectodermal 
gene regulatory sub-modules!®*7*! that got repeatedly assembled 
for the patterning of bilaterian nerve cords and neuronal cell type 
specification. Therefore, advancing our understanding of CNS evolu- 
tion largely relies on functionally identifying the developmental impli- 
cations of the anteroposterior and dorsoventral patterning systems in 
diverse bilaterians, before they can be used to homologize particular 
morphological structures and cell types*“”. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 


No statistical methods were used to predetermine sample size. The experiments 
were not randomized. The investigators were not blinded to allocation during 
experiments and outcome assessment. 

Animal collections and sample fixations. Gravid adults were collected from 
the coasts near Friday Harbor Laboratories, San Juan Island, Washington, USA 
(T. transversa), Espeland Marine Biological Station, Norway (M. stichopi and 
N. anomala), Fanafjorden, Norway (L. ruber), Station Biologique de Roscoff, 
France (O. fusiformis), and Gullmarsfjord, Sweden (N. westbladi and X. bocki). 
P. Ladurner (University of Innsbruck) provided a stable culture of I. pulchra, 
which was maintained as previously described*’. A stable laboratory culture 
of E. senta was maintained in glass bowls with 25 ml of Jaworski’s medium in a 
controlled environment of 20°C and a 14:10h light:dark cycle. They were fed 
ad libitum with the algae Rhodomonas sp., Cryptomonas sp., and Chlamydomonas 
reinhardtii. Brachiopod, nemertean, and annelid adults were spawned as described 
elsewhere**4”, Acoelomorph eggs were collected year round (J. pulchra) and in 
September-October (M. stichopi)**. All samples were fixed in 4% paraformal- 
dehyde in culture medium for 1h at room temperature. After fixation, samples 
were washed in 0.1% Tween 20 phosphate buffer saline, dehydrated through a 
graded series of methanol, and stored at —20°C in pure methanol. Samples used 
for immunohistochemistry were stored in Tween 20 phosphate buffer saline at 
4°C. Before fixation, larval and juvenile stages were relaxed in 7.4% magnesium 
chloride; E. senta were relaxed in 10% EtOH and 1% bupivacaine. The eggshells of 
M. stichopi and I. pulchra eggs were permeabilized with 1% sodium thioglycolate 
and 0.2mg ml! protease for 20 min before fixation. 

DMH1 treatments. M. stichopi and I. pulchra embryos were collected at the one- 
or two-cell stage and cultured with regular water changes in cell culture dishes 
until the desired developmental stage. Control embryos were treated with 0.1% 
dimethylsulfoxide and experimental embryos were treated with DMH1 (Sigma) 
up to 10|1M. Seawater containing the DMH1 was changed every day until fixation. 
Embryos and hatchlings were fixed as described above, and stored in Tween 20 
phosphate buffer saline at 4°C. 

Gene identification and expression analyses. RNA sequencing data obtained 
from mixed developmental stages and juveniles/adults were used for gene identi- 
fication. Gene orthology was based on reciprocal best BLAST hit. For particular 
gene families, maximum likelihood phylogenetic analyses were conducted with 
RAXML version 8.2.6 (ref. 48), after building multiple protein alignments with 
MAFFT version 7 (ref. 49) and trimming poorly aligned regions with gblocks 
version 0.91b (ref. 50) (Supplementary Fig. 1). Whole-mount colorimetric in situ 
hybridization on brachiopod embryos, L. ruber, O. fusiformis, and juvenile E. senta 
was performed following an already established protocol***, Probe concentrations 
ranged from 0.1 to 1 ngyl', and permeabilization time was 15 min for M. stichopi 
and post-metamorphic brachiopod juveniles, 5 min for I. pulchra, and 10 min for 
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the other species. Double fluorescent whole-mount in situ hybridization was 
performed as described elsewhere*”. 

Immunohistochemistry. Samples were permeabilized in 0.1-0.5% Triton X-100 
phosphate buffer saline (PTx), and blocked in 0.1-1% bovine serum albumin in 
PTx. The antibodies anti-tyrosinated tubulin (Sigma), anti-serotonin (Sigma), and 
anti- FMRFamide (Immunostar) were diluted in 5% normal goat serum in PTx at 
a concentration of 1:500, 1:200, and 1:200, respectively. Samples were incubated 
with the primary antibody solutions for 24-72 h at 4°C. After several washes in 
1% bovine serum albumin in PTx, samples were incubated overnight with Alexa- 
conjugated secondary antibodies at a 1:250 dilution in 5% normal goat serum in 
PTx. Before mounting and imaging, samples were washed several times in 1% 
bovine serum albumin in PTx. Nuclei and actin filaments were counterstained 
with 4’,6-diamidino-2-phenylindole (DAPI; Molecular Probes) and BODIPY FL 
Phallacidin (Molecular Probes). 

Imaging. Representative embryos from colorimetric in situ hybridization experi- 
ments were cleared in 70% glycerol and imaged with a Zeiss Axiocam HRc con- 
nected to a Zeiss Axioscope Ax10 using bright-field Nomarski optics. Fluorescently 
labelled samples were cleared and mounted in benzyl benzoate/benzyl alcohol 
(2:1) and scanned in a Leica SP5 confocal laser-scanning microscope. Images were 
analysed with Fiji and Photoshop CS6 (Adobe), and figure plates were assembled 
with Illustrator CS6 (Adobe). Brightness/contrast and colour balance adjustments 
were applied to the whole image, not parts. 

Data availability. All newly determined sequences have been deposited in 
GenBank under accession numbers KY809717-KY809754, KY709718-KY709823, 
and MF988103-MF988108. Multiple protein alignments used for orthology assign- 
ment are available upon request from the corresponding author. Extended Data 
Fig. 6c has associated source data. 
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Extended Data Figure 1 | Studied species. a-i, Images of the adult forms of the studied species within a consensus bilaterian phylogeny’®. Colour boxes 
highlight major taxonomical clades. Scale bars, 100 1m in a-e, 0.5 cm in g andi, lcm inf, h, andi. 
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Extended Data Figure 2 | Gene expression in X. bocki and N. westbladi. 


a, Two six3/6 paralogues are expressed in the anterior head margin in 

X. bocki (arrowheads). b, The neural marker synaptotagmin (syt) is 
detected in the circumferential (cf; inset 1) and side (sf; inset 2) sensory 
furrows in X. bocki. c, In N. westbladi, the anterior marker sFRP1/5 
(arrowhead) and the posterior genes gbx and wnt1 are asymmetrically 
expressed along the anteroposterior axis of N. westbladi. d, The BMP 
ligands bmp2/4-a and bmp2/4-c are expressed dorsally, whereas the BMP 


antagonist admp is expressed dorsolaterally. e, The CNS of N. westbladi 
comprises an anterior ring-like commissure (green arrowheads) and a 
main pair of ventral condensations (red arrowheads). f, The neuronal 
marker syt is highly expressed in the anterior part (inset 1), and in the 
nerve cords (inset 2). In the different panels, dotted rectangles indicate 
magnified areas. In all panels, the anterior pole is to the left. The schematic 
drawing in e is not to scale. Scale bar, 100,1m in e. 
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Extended Data Figure 3 | Anteroposterior patterning in 
Xenacoelomorpha. a, b, Expression of anteroposterior markers in adult 
specimens of M. stichopi and I. pulchra. In both species, sFRP1/5, vax, 
six3/6, and BarH are expressed in anterior territories (black arrowheads). 
In M. stichopi, Rx is also expressed anteriorly, but broadly along the 
animal body in I. pulchra. In this acoel, emx is detected in the anterior 
part of the animal (background staining close to the gonads). In the 
nemertodermatid, the anterior neural markers otx, otp, pax2/5/8, and 
fezf are expressed along the entire anteroposterior axis, in association 
with the dorsal nerve cords (black dotted lines in ofp). In I. pulchra, otx, 
pax2/5/8-a, and pax2/5/8-b are broadly expressed. In M. stichopi, an irx 


Nemertodermatid — M. stichopi 


SFRP1/5, Vax, six3/6 
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orthologue is detected in the posterior tip, whereas it is detected in the 
anterior tip and around the mouth and copulatory apparatus in the acoel 
(arrowheads). The gbx orthologue of M. stichopi is expressed posteriorly, 
and the trunk-related Hox genes are expressed in two lateral rows (anterior 
Hox) and anteriorly to the mouth and in the posterior tip (posterior Hox). 
In the nemertodermatid and the acoel, Wnt ligand genes are expressed 
posteriorly (arrowheads). All images are dorsoventral views with anterior 
to the left. c, Schematic summary of anteroposterior expression in the 
nemertodermatid M. stichopi and the acoel I. pulchra. Drawings are not to 
scale and the extent of the expression domains are only approximate. The 
expression of posterior Hox in I. pulchra is based on ref. 22. 
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smadt “Wie” smaca smad6 
Extended Data Figure 4 | Expression of BMP components in (elav1* cells), but in cells located medially to the nerve cords (tubulin 
Nemertodermatida and Acoela. a, In the nemertodermatid positive). The cells expressing bmp2/4 also express tsg, and cv2 is expressed 
M. stichopi, the BMP pathway antagonists twisted gastrulation (tsg) and dorsally along the nerve cords. ¢, In the acoel I. pulchra, the BMP 
crossveinless 2 (cv2) are expressed dorsally, whereas the antagonist BAMBI _ antagonist tld is expressed ventrally, bmpR-I is detected in the inner body, 
is broadly detected in the ventral side. The gene tolloid (tld) is expressed and bmpR-II is expressed anteriorly and posteriorly around the copulatory 
both dorsally and ventrally. The BMP receptor bmpR-I is expressed organ. The genes smad1 and smad4 are expressed generally, while smad6 is 
dorsolaterally and bmpR-II is detected more broadly. The genes smad1 expressed in two bilaterally symmetrical anterior clusters. All main panels 
and smad4 are expressed broadly and smad6 is expressed along the dorsal are dorsoventral views, and the insets are lateral views. 


nerve cords. b, The BMP ligand bmp2/4 is not expressed in neuronal cells 
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Extended Data Figure 5 | Expression of neuronal markers in posterior lobes of the brain, as well as some neurite bundles, are visible. 
Nemertodermatida and Acoela. a, In the nemertodermatid M. stichopi, Similarly, the first serotonergic cells are detected at 24h post-fertilization 
the genes associated with neuronal fate commitment, elav1, soxB2, ash1, in the anterior end (arrowheads). d, In I. pulchra, the pro-neural 
ash2, atonal, and neuroD, are detected along the dorsal nerve cords. marker elav1 is broadly expressed, soxB is detected in the head region 
b, Similarly, the neuronal markers synaptotagmin (syt), tyrosine (arrowhead), and ash1b in the anterior tip (arrowhead). e, In I. pulchra, 
hydroxylase (tyr), vesicular monoamine transporter (VMAT), choline the neuronal marker syt is highly expressed in the anterior neuropile. The 
acetyltransferase (ChAT), vesicular acetylcholine transporter (VAchT), and marker tyr is detected in the statocyst and isolated cells. VMAT is detected 
tryptophan hydroxylase (tph) are mostly expressed dorsally, along the in isolated dorsal cell clusters in the juvenile that concentrate along the 
dorsal nerve cords. c, Morphology of I. pulchra embryos stained against adult brain. ChAT and VACcHT are expressed in the brain in juveniles and 
tyrosinated tubulin (Tyr Tub) and serotonin (5-HT), and counterstained adults (gonadal staining in the adult is background). The gene tph is 
with phallacidin (actin bundles) and DAPI (nuclei). The first tubulin- expressed in isolated ventral cells of the adult. All panels are dorsoventral 
positive cells that resemble neurons appear anteriorly (arrowheads) at views with anterior to the left. Scale bars, 50 1m inc. 


24h post-fertilization (hpf). By 32 h post-fertilization, the anterior and 
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Extended Data Figure 6 | DMH1 treatments in M. stichopi and 

I. pulchra. a, Schematic overview of dorsomorphin homologue 1 (DMH1) 
treatments in M. stichopi and percentage of hatching embryos for each 
experimental condition. b, M. stichopi embryos incubated with DMH1 
from 3 to 8 weeks and after hatching show more serotonergic commissures 
than control animals. c, The differences in the number of commissures 
are significant in both pre-hatching (asterisk; two-tailed t-test; p<0.0001) 
and post-hatching (asterisk; two-tailed t-test; p<0.0014) treated embryos. 
In contrast, the number of serotonin-positive neurite bundles is not 
significantly increased in any of the treatments. d, Despite the abnormal 
development of serotonergic axonal tracts, slit and robo genes are 
expressed similarly. The differences in signal intensity are due to technical 
variability. e, Schematic overview of DMH1 treatments in I. pulchra and 
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the percentage of hatching embryos for each experimental condition. 

f, Morphological analyses of DMH1-treated embryos. Treatment in early 
stages affects normal development, whereas treatments from 4h onwards 
do not significantly compromise embryogenesis. g, Embryos treated 
between 0 and 4h post-fertilization and fixed at 24h of development show 
expanded expression of the ventral marker nkx2.1, reduced expression of 
the dorsal gene bmp2/4, and unaffected expression of the anterior marker 
sFRP1/5. The embryo shows a disorganized morphology, as revealed by 
actin staining. h, The expression of the ventral marker nkx2.1 is expanded 
in early treated embryos (0-48 h), but unaffected in embryos treated after 
4h of development. In b, d, f-h, the asterisk marks the anterior pole. In 

b, d, f, panels are dorsoventral views, and in g and h the panels are lateral 
views. 
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Extended Data Figure 7 | Gene expression in Brachiopoda. a, Gene 
expression during early gastrulation and elongation, and in late larvae 

of T. transversa. The gene nkx2.2 is expressed ventroposteriorly (black 
arrowhead) and in the pedicle lobe of the larva (arrow). The gene nkx6 

is detected in two bilateral symmetrical ectodermal posterior clusters 
(arrowheads) and in the archenteron wall. In the larva, nkx6 is expressed 
in the pedicle lobe (arrow) and midgut. pax3/7 is first detected in 

two ventrolateral domains at the prospective apical-trunk boundary 
(arrowheads), and in the ventral anterior region of the larva. The gene msx 
is first expressed dorsally, in the future mantle ectoderm (arrowheads), 
and in the mantle of the larva. b, In 2-day-old post-metamorphic juveniles, 
the CNS comprises a main serotonergic anterior commissure (white 
arrowhead; dorsoventral view) that innervates the developing lophophore. 
The schematic drawing is not to scale, and the blue line represent the 
commissure. c, The gene nkx2.1 is expressed in the anterior region 
(arrowhead), between the lophophores in 2-day-old juveniles. The genes 
nkx2.2 and nkx6 are expressed in the pedicle (arrowheads), and nkx6 

is also detected in the gut (arrow). The gene pax6 shows no expression, 
pax3/7 is detected in the neural commissure (arrowhead), and msx is 
expressed in the cells at the edge of the mantle (dotted line). 


d, Neuronal markers in late larvae of T: transversa. The serotonergic 
marker tph is expressed in the anteroventral condensation of the mantle 
lobe (arrowhead) and in dorsal ectodermal cells of the apical lobe (arrow). 
No expression is detected for Hb9, and the genes dbx, VAchT and ChAT are 
all detected in the anterior apical neuroectoderm (arrowheads). e, Gene 
expression during early gastrulation and elongation, and in late larvae 

of N. anomala. The gene nkx2.2 is expressed in the anterior blastoporal 

lip at the onset of axial elongation, and it is not detected in the late larva. 
The gene nkx6 is asymmetrically expressed around the blastopore, in the 
putative anteroventral ectoderm (arrowhead). As the blastopore closes, the 
expression extends posteriorly and concentrates along the midline of the 
larva (arrowhead). The gene pax3/7 is detected in the posterior mesoderm 
at the onset of axial elongation (arrow). The gene msx is expressed in 

the prospective mantle lobe ectoderm (arrowheads) and in the dorsal 
shell-forming epithelium of the late larva. The asterisks indicate the 
animal/anterior pole and white dashed lines in a and d mark the region of 
background noise caused by probe trapping in the shell-forming ectoderm. 
Panel orientations are indicated in the first row/column and apply to the 
rest of the panels in the same column/row. Scale bar, 100 1m in b. 
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Extended Data Figure 8 | Gene expression in the nemertean 

L. ruber. a, None of the nerve cord patterning genes is expressed during 
gastrulation in L. ruber. In the intracapsular larva, nkx2.1 is expressed 

in the cephalic imaginal discs (arrowheads), nkx2.2 and nkx6 in an 
anterior and a posterior domain of the trunk imaginal discs (arrowheads) 
respectively, and pax6 is detected both in the cephalic and in the anterior 
trunk imaginal discs (arrowheads); pax3/7 is broadly expressed. With 
metamorphosis, nkx2.1 is detected in the head and proboscis, nkx2.2 is 
detected in the nerve cords and isolated trunk cells (arrowheads), nkx6 is 


expressed in the nerve cords (arrowheads), pax6 is observed in the head 
and nerve cords (arrowheads), and pax3/7 remains broadly expressed. All 
gastrulae are vegetal views. For larvae and early juveniles, the left column 
is a dorsoventral view and the right column is a lateral view (anterior to 
the left). All late juvenile pictures are lateral views, with anterior to the left. 
b, Lateral views (anterior to the left) of neuronal markers in juveniles. 
They are all expressed in the VNCs (arrowheads), and not in the dorsal 
neurite bundle. In all panels, the asterisk indicates the position of the 
mouth opening. Abbreviations: bp, blastopore; mo, mouth; pb, proboscis. 
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Extended Data Figure 9 | Molecular patterning and motor neuron 
markers in Rotifera and Annelida. a, Expression of the motor neuron 
markers Hb9 and ChAT in juveniles of the rotifer E. senta. The gene Hb9 

is detected in neurons of the mastax (arrowheads) and weakly in isolated 
cells of the brain (arrow). The gene ChAT is detected in the brain (arrow), 
cells of the corona and mastax (arrowheads). b, Expression of dorsoventral 
patterning genes in gastrulae and elongating embryos of O. fusiformis. The 
genes nkx2.2 and nkxé6 are expressed in the internalized endomesoderm 
(arrowheads). The gene pax6 is expressed in two lateral rows during 
elongation (arrowhead) and pax3/7 in two lateral cells (arrowhead). Of the 
two paralogues, msx-a is first detected in a posterior ectodermal domain 
(arrowhead) and in two additional bilaterally symmetrical posterior cells 


(arrowheads) during elongation. The gene msx-b is only detected during 
elongation in a posterior domain (arrowhead). c, Ventral view of the 
expression of nkx2.1 in the juvenile of the annelid O. fusiformis. This gene 
is detected in the foregut (arrowheads) and hindgut (arrow). d, Expression 
of the motor neuron markers Hb9 and ChAT in O. fusiformis. Hb9 is first 
detected in lateral domains of the archenteron/gut during embryogenesis 
and in the larva, and in isolated cells of the ventral trunk of the juvenile. 
The gene ChAT is detected in three cells of the apical region of the embryo 
and larva, and in the neuropile and two lateral ventral cords of the juvenile. 
Abbreviations: bp, blastopore; mo, mouth; ms, mastax. The asterisk in a 
marks the position of the mouth. 
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Extended Data Figure 10 | Dorsoventral patterning and the evolution 
of bilaterian trunk neuroanatomy. a, b, Schematic drawings of trunk 
neuroanatomy (nerve cords in blue) and expression of patterning 

genes in spiralian (a) and bilaterian (b) lineages. The overall location of 
patterning genes expression domains with respect to the dorsoventral 
axis and nerve cords is indicated by light green. In a, the red dashed 
squared expression of pax6 and pax3/7 in brachiopods indicates that these 
expression domains are only in the anterior region of the mantle lobe, 
not all along the trunk. Similarly, the red dashed squared expression of 
nkx6 in rotifers highlights that this gene is only expressed posteriorly in 
the trunk. In b, the red dashed squared expression of nkx2.1, nkx2.2, and 
nkx6 in Cnidaria indicates that these genes are expressed in the pharynx 
ectoderm. The red dashed squared expression of nkx6 in M. stichopi 
shows that this gene is only expressed posteriorly. In the acoel I. pulchra, 
the red dashed squared expression of nkx2.2 specifies that this gene is 
only expressed between mouth and copulatory organ. Red circles imply 
that a gene is not expressed in the trunk or is missing. Question marks 


indicate that there are no available data about the expression of that 
particular gene. See Supplementary Table 2 and main text for references. 
Schematic drawings are not to scale and only represent approximate 
relative expression domains. c, Alternative scenarios for the evolution of 
the dorsoventral patterning and bilaterian nerve cords. In scenario A, the 
medially condensed nerve cords of vertebrates, arthropods, and annelids 
are homologous. Therefore, the dorsoventral patterning was lost multiple 
times both in lineages with medially condensed nerve cords (for example, 
the annelid O. fusiformis, cephalochordates, and tunicates) and in lineages 
with multiple nerve cords and diffuse nerve nets. In scenario B, which 

is supported by this study and is more parsimonious, the similarities 

in dorsoventral patterning and trunk neuroanatomies of vertebrates, 
arthropods, and some annelids evolved convergently. The diversity of 
nerve cord arrangements in nephrozoan lineages hampers reconstruction 
of the ancestral neuroanatomy for this group (question mark). Animal 
phylogeny is based on ref. 18. 
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years of cosmic history 


D. P. Marronel, J. S. Spilker!, C. C. Hayward?”, J. D. Vieirat, M. Aravena’, M. L. N. Ashby°, M. B. Bayliss®, M. Béthermin’, 

M. Brodwin®, M. S. Bothwell?”°, J. E. Carlstrom!+!23-4, §. C. Chapman, Chian-Chou Chen", T. M. Crawford!>4, 

D.J.M. Cunningham! C. De Breuck"®, C. D. Fassnacht!®, A. H. Gonzalez’, T. R. Greve’, Y. D. Hezaveh”!, K. Lacaille?’, 

K. C. Litke!, S. Lower’, J. Ma!®, M. Malkan”’, T. B. Miller!>, W. R. Morningstar”, E. J. Murphy*4, D. Narayanan”, K. A. Phadke’, 
K. M. Rotermund!, J. Sreevani*, B. Stalder’, A. A. Stark®, M. L. Strandet?®?’, M. Tang! & A. Weik?6 


According to the current understanding of cosmic structure 
formation, the precursors of the most massive structures in the 
Universe began to form shortly after the Big Bang, in regions 
corresponding to the largest fluctuations in the cosmic density 
field'->. Observing these structures during their period of active 
growth and assembly—the first few hundred million years of 
the Universe—is challenging because it requires surveys that are 
sensitive enough to detect the distant galaxies that act as signposts 
for these structures and wide enough to capture the rarest objects. 
As a result, very few such objects have been detected so far*°. Here 
we report observations of a far-infrared-luminous object at redshift 
6.900 (less than 800 million years after the Big Bang) that was 
discovered in a wide-field survey®. High-resolution imaging shows 
it to bea pair of extremely massive star-forming galaxies. The larger 
is forming stars at a rate of 2,900 solar masses per year, contains 
270 billion solar masses of gas and 2.5 billion solar masses of dust, 
and is more massive than any other known object at a redshift of 
more than 6. Its rapid star formation is probably triggered by its 
companion galaxy at a projected separation of 8 kiloparsecs. This 
merging companion hosts 35 billion solar masses of stars and has 
a star-formation rate of 540 solar masses per year, but has an order 
of magnitude less gas and dust than its neighbour and physical 
conditions akin to those observed in lower-metallicity galaxies in 
the nearby Universe’. These objects suggest the presence of a dark- 
matter halo with a mass of more than 100 billion solar masses, 
making it among the rarest dark-matter haloes that should exist in 
the Universe at this epoch. 

SPT0311—58 (SPT-S J031132—5823.4) was originally identified 
in the 2,500-deg” South Pole Telescope (SPT) survey*”? as a luminous 
source (flux densities of 7.5 mJy and 19.0 mJy at wavelengths of 2.0mm 
and 1.4mm, respectively) with a steeply increasing spectrum, indica- 
tive of thermal dust emission. Observations with the Atacama Large 
Millimeter/submillimeter Array (ALMA) provide the redshift of the 
source. The J =6-5 and J =7-6 rotational transitions of the carbon 
monoxide molecule and the *P,—*P, fine-structure transition of atomic 
carbon were found redshifted to 87-103 GHz in a wide spectral scan’. 


The frequencies and spacings of these lines unambiguously place the 
galaxy at a redshift of z= 6.900(2), which corresponds to a cosmic 
age of 780 Myr (using cosmological parameters’? of Hubble constant 
Hy=67.7km s~! Mpc7!, matter density 2, =0.309 and vacuum energy 
density 2,=0.691). An elongated faint object is seen at optical and 
near-infrared wavelengths, consistent with a nearly edge-on spiral 
galaxy at z=1.4+0.4 that acts as a gravitational lens for the back- 
ground source (see Methods section “Modelling the SED’; here and 
elsewhere the error range quoted corresponds to a lo uncertainty). 
Together, these observations indicate that SPT0311—58 is the most 
distant known member of the population of massive, infrared-bright 
but optically dim, dusty galaxies that were identified from ground- and 
space-based wide-field surveys''. 

The far-infrared emission from SPT0311—58 provides an opportunity 
to study its structure with little confusion from the foreground galaxy. 
We conducted ALMA observations at about 0.3” resolution at three 
different frequencies (see Methods): 240 GHz, 350 GHz and 420 GHz, 
corresponding to rest-frame wavelengths of 160 1m, 110,1m and 90m. 
The observations at 240 GHz include the 158-\1m fine-structure line of 
ionized carbon ({C ]) and those at 420 GHz the 88-\1m fine-structure 
line of doubly ionized oxygen ([O m]). The 160-j1m continuum and the 
[C 11] and [O 111] line emission maps of the source are shown in Fig. 1. 
Two emissive structures are visible in the map, denoted SPT0311—58 E 
and SPT0311—58 W, which are separated by less than 2” on the sky 
before correction for gravitational deflection. Although the morphology 
of SPT0311—58 E and SPT0311—58 W is reminiscent of a lensing 
arc (SPT0311—58 W) and counter-image (SPT0311—58 E), the [C m1] 
line clarifies the physical situation: SPT0311—58 E is separated from 
the brighter source SPT0311—58 W by 700 km s~’ and is therefore a 
distinct galaxy. 

Lens modelling of the 160-j1m, 110-j1m and 90-j1m continuum 
emission from SPT0311—58 was performed using a pixelated recon- 
struction technique” (Fig. 1c, Extended Data Fig. 5, Methods section 
‘Gravitational lens modelling’). Its structure and lensing geometry 
is consistent between the observations, and indicates that the two 
galaxies are separated by a projected (proper) distance of 8 kpc in the 
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Figure 1 | Continuum, [C 11] and [O 111] emission from SPT0311—58 
and the inferred source-plane structure. a, Emission in the 157.74-1jm 
fine-structure line of ionized carbon ([C m]) as measured at 240.57 GHz 
with ALMA, integrated over 1,500 km sl of velocity, is shown with the 
colour scale. The range in flux per synthesized beam (the 0.25” x 0.30” 
beam is shown in the lower left) is provided at right. The rest-frame 160-j1m 
continuum emission that was measured simultaneously is overlaid, with 
contours at 8, 16, 32 and 64 times the noise level of 34 Jy per beam. 
SPT0311—58 E and SPT0311—58 W are labelled. b, The continuum- 
subtracted, source-integrated [C 1] (red) and [O 111] (blue) spectra. The 
upper spectra are as observed (‘apparent’) with no correction for lensing, 
whereas the lensing-corrected (‘intrinsic’) [C 11] spectrum is shown at the 
bottom. SPT0311—58 E and SPT0311—58 W separate almost completely 
at a velocity of 500km s~*. c, The source-plane structure after removing 
the effect of gravitational lensing. The image is coloured according 

to the flux-weighted mean velocity, showing that the two objects are 


source plane. SPT0311—58 E has an effective radius of 1.1 kpc, whereas 
SPT0311—58 W has a clumpy, elongated structure that is 7.5 kpc across. 
The (flux-weighted) source-averaged magnifications of each galaxy and 
of the system as a whole are quite low (jig = 1.3, ftw =2.2, tot =2.0) 
because SPT0311—58 W is extended relative to the lensing caustic and 
SPT0311—58 Eis far from the region of high magnification. The same 
lensing model applied to the channelized [C 11] data reveals a clear 
velocity gradient across SPT0311—58 W, which could be due to either 
rotational motions or a more complicated source structure coalescing 
at the end of a merger. 

Having characterized the lensing geometry, it is clear that the two 
galaxies that comprise SPT0311—58 are extremely luminous. Their 
intrinsic infrared (8—1,000 1m) luminosities have been determined from 
observations of rest-frame ultraviolet-to-submillimetre emission (see 
Methods section ‘Modelling the SED’) to be Lip= (4.61.2) x 10?Lo 
and Lip= (3347) x 10'°L5 for SPT0311—58 E and SPT0311—58 W, 
respectively, where Lo is the luminosity of the Sun. Assuming that 
these sources are powered by star formation, as suggested by their 
extended far-infrared emission, these luminosities are unprecedented 
at z>6. The implied (magnification-corrected) star-formation 
rates are correspondingly enormous—(540 + 175)Mz yr~! and 


2 | NATURE | VOL 000 | 00 MONTH 2017 


physically associated but separated by roughly 700km s~! in velocity 

and 8 kpc (projected) in space. The reconstructed 160-j1m continuum 
emission is shown as contours. The scale bar represents the angular 

size of 5 kpc in the source plane. d, The line-to-continuum ratio at the 
158-\1m wavelength of [C 11], normalized to the map peak. The [C 11] 
emission from SPT0311—58 E is much brighter relative to its continuum 
than for SPT0311—58 W. e, Velocity-integrated emission in the 88.36-j1m 
fine-structure line of doubly ionized oxygen ([O 111]) as measured at 
429.49 GHz with ALMA (colour scale). The data have an intrinsic angular 
resolution of 0.2” x 0.3”, but have been tapered to 0.5” owing to the lower 
signal-to-noise ratio of these data. f, The luminosity ratio between the 

[O 111] and [C 11] lines. As for the [C 11] line-to-continuum ratio, a large 
disparity is seen between SPT0311—58 E and SPT0311—58 W. The sky 
coordinates and contours for rest-frame 160-j1m continuum emission in 
d-f are the same as in a. 


(2,900 + 1,800)Mz yr~!, where Mz is the mass of the Sun—probably 
owing to the increased instability associated with the tidal forces 
experienced by merging galaxies'’. The components of SPT0311—58 
have luminosities and star-formation rates similar to the other mas- 
sive, z > 6 galaxies identified by their dust emission, including HFLS3 
(z= 6.34), which has a star-formation rate of 1,300M. yr! after 
correcting for a magnification factor" of 2.2, and a close quasar- 
galaxy pair!® at z= 6.59, the components of which are forming stars at 
rates of 1,900Mz yr~! and 800M. yr7!, respectively. However, unlike 
the latter case, there is no evidence of a black hole in either source in 
SPT0311—58. 

Unlike any other massive dusty source at z > 6, the rest-frame ultra- 
violet emission of SPT0311—58 E is clearly detectable with modest 
integration by the Hubble Space Telescope. The detected ultraviolet 
luminosity (Luy =(7.4£0.7) x 10!°L) suggests a star-formation rate 
of only 13Mz yr~', 2% of the rate derived from the far-infrared emis- 
sion, consistent with SPT0311—58 E forming most of its stars behind 
an obscuring veil of dust. The inferred stellar mass for this galaxy (see 
Methods section ‘Modelling the SED’) is (3.5 +1.5) x 10!°M.5. Although 
no stellar light is convincingly seen from SPT0311—58 W, the absence 
of rest-frame ultraviolet emission is probably explained by heavy dust 
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obscuration and is not unusual’®. Although SPT0311—58 E is the less 
massive of the two components, even it is rare among ultraviolet- 
detected galaxies at z + 7. Such galaxies are found in blank-field surveys 
to have a sky density of just one per 30 square arcminutes!’. 

The far-infrared continuum and line emission of SPT0311—58 E 
and SPT0311—58 W (Fig. 1d-f) imply substantial differences in the 
physical conditions in these objects. Compared to SPT0311—58 W, 
SPT0311—58 E has a higher ratio of [C 1] line emission to 160-j1m 
continuum emission and a much larger luminosity ratio between 
[O 11] and [C 11]. The [O m1] emission is much more luminous in 
SPT0311—58 E, with most of SPT0311—58 W (excluding the south- 
ern end) showing no emission at all. Because the formation of O** 
ions requires photons with energies of more than 35.1 eV, this line 
arises only in ionized regions around the hottest stars and near active 
galactic nuclei'®. It is unlikely that active galactic nuclei are the origin 
of the [O m1] line in SPT0311—58 E, because the continuum and line 
emission both extend across most of the galaxy rather than being con- 
centrated in a putative nuclear region. Observations of [O 111] 88-j1m 
emission in actively star-forming galaxies at low”! and high”? red- 
shift have found that the line luminosity ratio between [O 11] and 
[C 11] increases as gas metallicity decreases. The ultraviolet photons 
capable of forming O*~ have a longer mean free path in a lower- 
metallicity interstellar medium than in a higher-metallicity one, and 
the electron temperature remains higher for the same ionizing flux, 
both of which favour increased [O m1] emission!. The difference in 
the [C 1] line-to-continuum ratio may result from multiple effects: 
the known suppression? of the [C 11]-to-Lj, ratio in regions of 
increased star-formation surface density (higher in SPT0311—58 W), 
and the increased [C 11]-to-Lj, ratio in star-forming galaxies of lower 
metallicity’. Whether SPT0311—58 E (or the southern edge of 
SPT0311—58 W, which is similar to SPT0311—58 E in these properties) 
has a more primordial interstellar medium than does the bulk of 
SPT0311—58 W can be tested with future observations. 

The masses of the components of SPT0311—58 are remarkable 
for a time only 780 Myr after the Big Bang. In Fig. 2 we compare 
SPT0311—58 to objects at z > 5 for which we have estimates of dust 
mass (Must) or total gas mass (Mgas). For SPT0311—58, the best con- 
straints on both of these quantities come from the joint analysis® 
of its far-infrared continuum and line emission, specifically the 
rotational transitions of carbon monoxide and neutral carbon. Here 
we have divided these masses between the two galaxies according 
to the lensing-corrected ratio of dust continuum emission (6.7) 
that we determined from our three high-resolution ALMA con- 
tinuum observations because the dust continuum luminosity is 
roughly proportional to the dust mass. The corresponding dust 
and gas masses for SPT0311—58 W are Mgas = (2.7 + 1.7) x 10°Mo5 
and Maust = (2.5 + 1.6) x 10°Ma, and for SPT0311—58 E are 
Mas = (0.4 +0.2) x 10!'Mo and Mans = (0.4 + 0.2) x 10°M5. The 
gas mass can also be estimated using the carbon monoxide lumino- 
sity, although the conversion between luminosity and gas mass in this 
optically thick line is known to vary substantially depending on many 
factors, including star-formation intensity and metallicity*®. Taking 
the observed® luminosity in the J = 3-2 line of carbon monoxide, con- 
verting it to J= 1-0 under the conservative assumption of thermalized 
emission, and connecting luminosity to mass using a standard value of 
aco =1.0M. (K km s~'pc”)~!, we derive Meas = (6.6 + 1.7) x 10°M 
for SPT0311—58 W and Mgas = (1.0 + 0.3) x 10'°Mo for 
SPT0311—58 E. The gas mass of SPT0311—58 W is well above those 
of all of the known galaxies at z > 6, that is, during the first approxi- 
mately 900 Myr of cosmic history. 

SPT0311—58 highlights an early and extreme peak in the cosmic 
density field and presents an opportunity to test the predictions for 
the growth of structure in the current cosmological model. The mass 
of the dark-matter halo that hosts SPT0311—58 is uncertain, but 
can be estimated in several ways. For most massive star-forming 
galaxies”®”’ the gas mass represents the dominant component of 
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Figure 2 | Mass measurements for high-redshift galaxies. Dust masses 
(Maust) are taken from the literature, as described in Methods. Gas masses 
(Mgas) are primarily derived from observations of various rotational 
transitions of carbon monoxide, with previously reported line luminosities 
converted to molecular gas masses under standardized assumptions (see 
Methods). The comparison sample (small filled circles) includes three 
classes of object: dusty star-forming galaxies (DSFGs; red), quasars (QSOs; 
blue) and Lyman-break galaxies (LBGs; green). These objects are typically 
selected by far-infrared emission (DSFGs) or by optical or infrared 
emission (QSOs and LBGs). Three additional DSFGs—SPT0311—58 E 
(yellow pentagons), SPT0311—58 W (yellow hexagons) and HFLS3 (red 
squares)—have extensive photometry and line measurements, which 
enable more sophisticated estimates of their dust and gas masses®”’ from 

a combined analysis of the dust and carbon monoxide line emission. For 
these objects we also show masses derived under a simpler assumption as 
open symbols (for SPT0311—58 the methods give very similar answers for 
Maust). Error bars represent 1o uncertainties. 


baryons that have cooled and assembled at the centre of the dark-matter 
halo. In this case, for the lower (aco-based) estimate of gas mass, the 
cosmic baryon fraction’® f, =0.19 places a hard lower bound on the 
total halo mass of 4 x 10'!M,. A less conservative assumption incor- 
porates the knowledge, based on observations across a wide range of 
redshifts, that only a fraction of the baryons in a dark-matter halo (less 
than one-quarter, Mp/Mhpalo = 0.05; see figure 15 of ref. 3) are destined to 
accrue to the stellar mass of the central galaxy’. In this case, a total halo 
mass of (1.4-7.0) x 10!7Mq is implied, depending on which estimate of 
gas mass is adopted. To understand the rareness of the dark-matter halo 
that hosts SPT0311—58, we calculate curves that describe the rarest 
haloes that should exist in the Universe at any redshift”®. In Fig. 3, 
we show the halo masses that are inferred for many high-redshift 
galaxies, using the same methods for converting gas mass to halo mass 
as described above. We find that SPT0311—58 is indeed closest to the 
exclusion curves and therefore marks an exceptional peak in the cosmic 
density field at this time in cosmic history. 

We have found a system of massive, rapidly star-forming, dusty 
galaxies at z= 6.900, the most distant galaxies of this type discovered 
so far. Two compact and infrared-luminous galaxies are seen, separated 
by less than 8 kpc in projection and 700 km s~! in velocity, probably 
in the process of forming one of the most massive galaxies of the era. 
Even before coalescence, the larger galaxy in the pair is more massive 
than any other known galaxy at z>6. Although the discovery of such 
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Figure 3 | Halo masses for rare, high-redshift, massive galaxies. The 
mass of the dark-matter halo (Mpaio; defined at an density of 200 times 

the mean density of matter in the Universe) is inferred for galaxies in 

the first 2 Gyr after the Big Bang (see Methods). These masses present 

a range of lower limits, from the most conservative assumption (lower 
bars) that all baryons in the initial halo have been accounted for in the 
molecular gas mass to the observationally motivated assumption (upper 
triangles) that the baryonic mass (Mp) in gas is a fixed ratio of the halo 
mass Mj/Mhalo = 0.05, calibrated through a comparison’ of simulations 
and observations spanning z= 0-8. The most massive haloes that are 
expected to be observable*® within the whole sky (dotted line), within the 
2,500-deg* area of the South Pole Telescope (SPT) survey (dashed line) and 
within the subset of that area that is magnified by a factor of two or more 
(solid line) are also plotted as a function of redshift. As SPT0311—58 E 
and SPT0311—58 W reside within the same halo, they are combined for 
this analysis. As in Fig. 2, halo masses are derived for HFLS3 (large red 
triangles) and SPT0311—58 (large yellow triangles) using only the carbon 
monoxide luminosity (open symbols) and the more sophisticated dust and 
carbon monoxide analysis (filled symbols); the pairs of points are slightly 
offset in redshift for clarity. 


a system at this high redshift and in a survey that covered less than 10% 
of the sky is unprecedented, its existence is not precluded by the current 
cosmological paradigm. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 

ALMA millimetre and submillimetre interferometry. We acquired four obser- 
vations of SPT0311—58 with ALMA in four receiver bands (B3, B6, B7 and B8, 
covering 84-432 GHz) under projects 2015.1.00504.S and 2016.1.01293.S. A sum- 
mary of these observations, including dates, calibration sources, integration times, 
atmospheric opacity, noise levels and resolution, is provided in Extended Data 
Table 1. Salient details are provided below for each observation. 

The redshift of SPT0311—58 and the 3-mm continuum flux density were deter- 
mined from an 84.2-114.9-GHz spectrum assembled from five separate tunings 
in ALMA band 3 under ALMA Cycle 3 project 2015.1.00504.S. The observing 
strategy has been used to discover the redshifts of more than 50 SPT dusty sources, 
and further details on the redshift coverage are provided in previous works*”*!. 
Data were taken on 2015 December 28 and 2016 January 2 in ALMA configuration 
C36-1 (baseline lengths of 15-310 m) using 34 and 41 antennas, respectively. The 
resulting image has a resolution of 3.3” x 3.5”, although there is spatial information 
on finer scales that allows us to estimate flux densities separately for the E and 
W sources, which are separated by about 2”. Further details of the analysis are 
provided elsewhere’. 

ALMA observed SPT0311—58 a second time under project 2015.1.00504.S in 
band 7 (LO = 343.48 GHz) to produce a continuum image suitable for gravitational 
lens modelling. Similar observations were used to produce lens models of SPT 
sources in previous cycles**"?. The observations were performed with 41 antennas 
in the C40-4 configuration, providing 15-770-m baselines. The resulting image 
has an angular resolution of 0.3” x 0.5”, although, because it lacks any spectral 
lines, it was found to be insufficient to provide an unambiguous determination of 
the lensing configuration. 

The ALMA Cycle 4 project 2016.1.01293.S was intended to follow up on the 
discovery of this very distant source through spectroscopic observations. The 
158-|1m line of [C 11] was observed on 2016 November 3 in ALMA configuration 
C40-5, which provided baseline lengths of 18-1,120 m. This provides the primary 
imaging for this work, because it yielded an extremely sensitive detection of the 
[C 11] line and continuum structure at high resolution. 

A final observation was obtained in ALMA band 8 (LO = 423.63 GHz), in con- 
figuration C40-4 (baselines 15-920 m). The observations were repeated in four 
segments to yield the required integration time. The resulting data have 0.2” x 0.3” 
resolution. These data provide a final spatially resolved continuum observation, 
at 90-\um rest-frame wavelength, along with spectroscopic images of the 88-j1m 
line of [O 111]. The ALMA continuum images are shown in Extended Data Fig. 1. 
Spitzer infrared imaging. Infrared observations of SPT0311—58 were acquired 
with the Infrared Array Camera (IRAC) instrument*® on the Spitzer Space 
Telescope as a part of Cycle 24 Hubble Space Telescope (HST) programme 14740. 
The observations consisted of 95 dithered 100-s exposures on-source in both 
operable IRAC arrays at 3.6j1m and 4.5,1m. A large dither throw was used. The 
dataset thus has sufficiently high redundancy to support our standard reduction 
procedure, which involves constructing an object-masked median stack of all 
95 exposures in each band and then subtracting the median stack from the raw 
frames to compensate for bad pixels not automatically masked by the pipeline 
and to remove gradients in the background. After these initial preparatory steps, 
the background-subtracted exposures were combined in the standard way* with 
IRACproc* and MOPEX to create mosaics with 0.6” pixels. The mosaics achieved 
an effective total integration time of about 9,000 s after masking cosmic rays and 
other artefacts. Two flanking fields were covered to the same depth but separately, 
each in one IRAC passband. 

Photometry was performed on the mosaics using Source Extractor in dual-image 
mode after trimming to exclude the flanking fields and unexposed areas. The 
lens galaxy associated with SPT0311—58 was well detected with no evidence for 
saturation or even nonlinear detector behaviour. During this process background 
and object images were generated and inspected to verify that Source Extractor 
performed as expected and generated valid photometry. 

HST imaging. SPT0311—58 was observed for five orbits of HST imaging with 
ACS and WFC3/IR in Cycle 24 (PID 14740) to determine the morphology of the 
foreground lens and to better constrain the spectral energy distribution (SED) of 
both the lens and source. All observations were acquired on 2017 April 30. The 
ACS imaging consists of a single orbit divided between the F606W and F775W 
filters. Exposure times are 844 s and 1.5 ks, respectively. Four orbits of WFC3/IR 
observing was split evenly between the F125W and F160W filters. Although the 
nominal exposure times are 5.6 ks, a subset of the data in both filters was compro- 
mised by substantial contamination from scattered earthlight. We reprocessed the 
imaging to remove contaminated data, resulting in final exposure times of 4.9 ks 
in each band. 

Gemini optical and infrared imaging and spectroscopy. With the Gemini Multi- 
Object Spectrograph*” (GMOS) of Gemini-South, we obtained deep i and z images 
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of SPT0311—58 (PID GS-2015B-Q-51) on 2016 January 29 and 31. The instru- 
ment consists of three 2,048 x 4,176 pixel CCDs, separated by two 6.46” (80 pixel) 
gaps, with a scale of 0.0807” per pixel. The field of view of the GMOS camera is 
5.5’ x 5.5’. Our images were taken under photometric conditions and using a 2 x 2 
binning, which gives a scale of 0.161” per pixel. The total integration times were 
3,600 s for the i band and 6,600 s for the z band, with average seeing conditions 
of 1.3” and 1.0” in the i and z bands, respectively. The resulting 5c point source 
depths were iaz = 25.2 and Zan=25.0. 

SPT0311—58 was observed using the Facility Near-Infrared Wide-Field Imager 
and Multi-Object Spectrograph for Gemini (FLAMINGOS-2)** at the Gemini- 
South Observatory on the nights of ur 2016 September 23 and 2017 February 
06, under PID GS-2016B-Q-68. The instrument was used in imaging mode, with 
0.181” pixels, and yielded an unvignetted circular field of view of approximately 
5.5’ diameter. Our observing sequence for the survey consisted of a randomly 
ordered dither pattern, with 15” offsets about the pointing centre. This pattern 
was repeated until the required total exposure time was achieved. The individual 
K,-band exposure time was set at 15 s in the first observation and 10 s in the 
second observation, yielding a typical background sky level in the K, band of 
10,000-12,000 counts (detector nonlinearity can be corrected to better than 1% 
up to 45,000 counts). These counts ensure that 2MASS stars with K, > 13 do not 
saturate and can be used for photometric calibration. The data were reduced using 
the Python-based FLAMINGOS-2 Data Pipeline, FATBOY**“”. In brief, a calibra- 
tion dark was subtracted from the dataset, a flat field image and a bad pixel map 
were created, and the flat field was divided through the data. Sky subtraction was 
performed to remove small-scale structure, with a subsequent low-order correction 
for the large-scale structure. Finally, the data were aligned and stacked. The see- 
ing conditions averaged 0.7” in the final image comprising 44 min of integration, 
reaching K; 4p= 23.6 at 5o. 

Spectroscopy was obtained with the GMOS-S instrument on the nights of uT 
2016 February 1 and 2 (PID GS-2016B-Q-68) using the 1”-wide long slit at a posi- 
tion angle —10° east of north and the instrument configured with the R400 grating 
and 2 x 2 detector binning. For a source that fills the 1” slit this set-up results in a 
spectral resolution of about 7 A. The observations were spectrally dithered, using 
two central wavelength settings (8,300 A and 8,400 A) to cover the chip gaps. The 
data comprise a series of individual 900-s exposures, dithering the source spatially 
between two positions (‘A and ‘B’) along the slit in an ABBA pattern, repeated four 
times, two at each central wavelength setting. The total integration time is 4 h. 
A bright foreground object was positioned along the slit midway between the 
acquisition star and SPT0311—58, providing an additional reference point for 
locating traces along the slit. 

The spectra were reduced, beginning with bias subtraction and bad pixel mask- 
ing using the IRAF GMOS package provided by Gemini. The individual chips were 
combined into a single mosaic for each exposure and the mosaicked frames were 
then sky-subtracted by differencing neighbouring A-B exposure pairs; this method 
resulted in nearly Poisson noise, even under the numerous bright sky lines. A flat- 
field slit illumination correction was applied and a wavelength calibration derived 
for each mosaic. The two-dimensional spectrum was created by median-combining 
the individual exposure frames. 

The spectrum shows a faint continuum beginning above 9,000 A at the location 
of SPT0311—58. A one-dimensional extraction of the faint trace yields no reliable 
redshift measurement, but is consistent with the redshifted 4,000-A break that is 
expected for the foreground galaxy at z~ 1.4. Calibrated against the nearby R= 16.4 
star spectrum we find no flux at the expected location of Lya redshifted to z= 6.900 
(about 9,600 A) down to a 3a flux limit of 3.0 x 107!” erg s !cm? fora emission 
line 500km s! wide. 

Image de-blending. At the position of SPT0311—58, our optical and infrared 
images (Extended Data Fig. 2) show a prominent lower-redshift galaxy that is 
responsible for lensing the W source, and the HST images, which have the highest 
resolution, show direct stellar emission from the E source (Extended Data Fig. 3). 
To extract reliable photometry for SPT0311—58 E, particularly in the low- 
resolution Spitzer images that cover the rest-frame optical, and to search for emis- 
sion from the W source underneath the lens galaxy, we must model and remove the 
lens emission. We follow procedures similar to those used previously*!, using the 
HST/WEC3 images as the source of the lens galaxy model to de-blend the IRAC 
image. The foreground lens can be fitted with a single Sérsic profile with an index 
n= 1.77. As seen in Extended Data Fig. 4, there is no clear rest-frame ultraviolet 
emission from SPT0311—58 W in the HST bands after removal of the lens 
model. To remove the lens from the IRAC image, the WFC3 model is convolved 
with the IRAC point spread function and then subtracted from the 3.6-\1m and 
4.5-\1m images. Residual emission is seen near the positions of the E and W sources. 
Unfortunately, because SPT0311—58 W lies right on top of the lens, the residuals 
are extremely susceptible to image de-convolution errors and we do not believe 
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the Spitzer/IRAC fluxes to be reliable. By contrast, SPT0311—58 E is one full IRAC 
resolution element, 1.7”, from the lens centroid, and we consider the residual emis- 
sion at this position to be usable in our subsequent analyses. Images of the model 
and residuals are provided in Extended Data Fig. 4 and the resulting photometry 
is provided in Extended Data Table 2. 

Gravitational lens modelling. Gravitational lens modelling of SPT0311—58 was 
performed using two different codes which model the source-plane emission 
in different ways. Both codes fit to the visibilities measured by ALMA or other 
interferometers directly to avoid the correlated noise between pixels in inverted 
images. In each, the lens galaxy is modelled as a singular isothermal ellipsoid, and 
posterior parameter distributions are sampled using a Markov chain Monte Carlo 
technique, marginalizing over several sources of residual calibration uncertainty 
(such as antenna-based phase errors). 

Initial lens models were created using the visilens code, which is described in 
detail elsewhere”’. The source plane is modelled as one or more elliptical Sérsic pro- 
files. Because of the simplicity of this source-plane representation, the code is able 
to sample large and complex parameter spaces quickly. The continuum emission 
at 160}1m, 110}1m and 901m was modelled with four Sérsic components, one for 
SPT0311—58 E and three for SPT0311—58 W. These models leave approximately 
80 peak residuals in the 160-j1m and 90-|1m data, which both reaching peak signal- 
to-noise ratios of more than 150. 

After determining the lens parameters using visilens, we used the best-fitting 
values as initial input to a pixelated reconstruction code’. This code represents the 
source plane as an array of pixels, rather than an analytic model, and determines 
the most probable pixel intensity values for each trial lens model while imposing a 
gradient-type regularization” to avoid over-fitting the data. For each dataset, we 
fit for the strength of this regularization. At 160|1m and 90\.m we re-fit for the lens 
model parameters and compare to the visilens models as a test of the robustness of 
the lens modelling. Within each code, the best-fitting lens parameters at the two 
independent wavelengths are consistent to within 10%. Further, the lens parameters 
and the source structure are consistent between the two independent codes, with 
intrinsic source flux densities, sizes and magnifications that agree to within 15%. 
The increased freedom in the source plane afforded by the pixelated reconstruction 
means that the lens parameters are not independently well constrained by the 
110-{1m data, which have lower signal-to-noise ratio and spatial resolution. For 
these data, we apply the lensing deflections determined from the other two datasets 
to reconstruct the source-plane emission. The pixelated reconstructions of the 
three continuum wavelengths are shown in Extended Data Fig. 5. 

The channelized [C 11] line is modelled using the same pixelated reconstruction 
technique, using 39 consecutive channels of 40 km s-! width, each with a peak 
signal-to-noise ratio ranging from 9 to 34. For each channel, we apply the lensing 
deflections from the best-fitting model of the 160-|1m data, which were observed 
simultaneously. We fit for the strength of the source-plane regularization!” at 
each channel, which varies across the line profile as some velocities (such those 
multiply imaged from —280km s“! to +80km s_) experience higher magnifica- 
tion than others (such as the entire eastern source at >+560km s~!). The models 
of each [C 11] channel are represented in Extended Data Fig. 6. 

We determine the source magnifications using the 90-\1m pixelated model, in 
which the E source is detected at the highest signal-to-noise ratio and so the effects 
of varying the aperture used to measure the intrinsic flux density are minimized. 
Because the source-plane morphology is very similar between the three contin- 
uum wavelengths, the magnification is also essentially identical between them. 
We find flux-weighted, source-averaged magnifications for the E source, the W 
source and the system as a whole of jig = 1.3, ow = 2.2 and fyot = 2.0, respectively. 
These magnifications are substantially lower than the median magnification of 5.5 
within the sample of 47 SPT-discovered dusty galaxies” for which we have data 
adequate to construct lens models or to conclude that sources are unlensed. In this 
case the low magnification is a consequence of the low mass of the lensing halo, 
which is typically expressed as an ‘Einstein’ radius 0g. The lens model for this source 
indicates 6; = 0.29", which is around the 10th percentile for SPT lensed sources”, 
and the background source is both much larger than and offset from the regions 
of highest magnification. A large portion of the source is therefore only weakly 
magnified and the source-averaged values are low. 

Finally, we also construct a lens model of the 95-GHz ALMA data (rest-frame 
380 1m; Extended Data Table 1). Because the spatial resolution of these data are 
low (3.5”), we model them using only the visilens code, which is more suited to 
low-resolution data. We allow only the lens parameters and source structural 
parameters (such as position and radius) to vary within the ranges determined 
from the higher-resolution 160-j1m, 110-jum and 90-|1m continuum data, leaving 
only the flux densities of the E and W sources as free parameters. This modelling 
indicates that essentially all of the observed 380-1m emission can be ascribed to 
the W source, with the E source ‘detected’ at about lo. 


In addition to the ALMA data, we use Herschel photometry° to constrain 
the SED of SPT0311—58 E and SPT0311—58 W to rest-frame 301m (250,1m 
observed). The resolution of Herschel SPIRE is not adequate to separate the two 
components, so we divide the total flux density observed in the three SPIRE bands 
between the E and W sources according to the ratios observed in the ALMA bands. 
These photometric points are then corrected for the continuum magnification 
derived from the ALMA data and used in the SED modelling described below. The 
total and intrinsic flux densities are reported in Extended Data Table 3. 
Modelling the SED. In Extended Data Fig. 7 we present the SEDs of SPT0311—58 E, 
SPT0311—58 W and the foreground lens galaxy. 

A photometric redshift for the lens is calculated with EAZY® using the data in 
Extended Data Table 2. The resulting redshift is 1.43, with a 1o confidence inter- 
val of 1.08-1.85. The lens SED fitting is performed with the Code Investigating 
GALaxy Emission (CIGALE™) assuming z= 1.43. 

The multiple rest-frame ultraviolet to rest-frame optical detections of 
SPT0311—58 E allow us to constrain the stellar mass using reasonable assump- 
tions about the star-formation history at this early point in cosmic history. The 
SED is fitted by varying the e-folding time and age of a previously reported stel- 
lar population model" under single- and two-component formation histories, 
assuming solar metallicity and previously reported” initial mass function. The 
minimum radiation field, power-law slope and gamma, the fraction of dust mass 
exposed to radiation intensities above the minimum, from one dust model*®, 
and the colour excess and attenuation slope from other dust models’ are kept 
free in the SED fitting. The AGN contribution is set to zero because there are 
no photometric points to constrain the spectral range that is most affected by 
AGN power (mid-infrared) and thus any fraction between 0% and 60% of the 
dust luminosity is attributable to AGNs with nearly equal probability. However, 
this ignores the spatial distributions of the dust and line emission, which are not 
strongly peaked as is usually observed in AGN-dominated galaxies, so we deem 
this wide range to be unphysical. The inferred stellar mass and star formation 
rates are (3.5+1.5) x 10!°Mz and (540+ 175)Mz yr~, respectively, for the 
two-component star-formation history. These values agree within the uncer- 
tainties for a single-component star-formation history. The infrared luminosity 
(Lig; integrated over 8-1,000 1m) is (4.6 + 1.2) x 10!*L= and the extinction is 
Ay =2.7 + 0.2 mag. 

For the W source, we have only upper limits and the potentially contaminated 
IRAC detections to constrain the rest-frame optical and ultraviolet emission. 
Accordingly, we use the IRAC photometry as upper limits, along with the HST 
limits and far-infrared data in Extended Data Table 3, and model the SED with 
CIGALE. We find a luminosity of Lip = (33 +7) x 10!7L5, seven times larger than 
for the E source. A consistent luminosity is obtained by fitting the far-infrared SED 
with a modified blackbody*’. The inferred star-formation rate, which is closely 
connected to Lyp, is (2,900 + 1,800)Mj yr7. As for the E source, the SED allows 
the AGN fraction to fall between 0% and 60% with roughly equal probability, 
so we take the absence of a dominant infrared emission region (see Fig. 1c and 
Extended Data Fig. 5) as an indication that the AGN contribution is unlikely to 
be important and fix the AGN fraction to zero. The dust luminosity due to star 
formation could therefore in principle be up to a factor of two smaller if the spa- 
tial distribution of the emission is ignored. Given that the photometry reaches to 
only the rest-frame V band, it is possible to hide a very large stellar mass behind 
dust obscuration for plausible values of the visual extinction (Ay < 6, as seen 
in other massive dusty galaxies'+1®°*5). Considering the IRAC flux densities 
alone, we can calculate rest-frame mass-to-light ratios for the observed bands to 
see what masses could exist without relying on the poorly constrained CIGALE 
SED modelling. We use a stellar population synthesis code**°° to compute a stel- 
lar mass-to-light ratio under a range of assumptions: stellar ages of 0.1-0.8 Gyr 
(from a reasonably ‘young’ population to the approximate age of the Universe at 
the time) and metallicity of 0.1-1 times that of the Sun, with no dust attenuation. 
The mass then ranges from (2-10) x 10!°Mo per Jy of measured flux density. 
Taking the measured and de-magnified flux density (averaged between the two 
wavelengths) of 0.5 ,wJy, we find a stellar mass of (1-5) x 10!°M., before correcting 
for extinction. If the extinction is as large as 5 mag, the true stellar mass could be 
unphysically large (>10'2M.), demonstrating that we have no useful constraint 
without greater certainty about the reliability of the IRAC flux densities or more 
photometric data points. 

Galaxy and halo masses. In Figs 2 and 3 we compile mass measurements for 
high-redshift galaxies discovered by various techniques. The galaxy sample com- 
prises primarily galaxies identified through their luminous dust emission (DSFGs) 
and optically identified quasars (QSOs), which are typically the objects with the 
largest gas, dust or stellar masses at these redshifts. At the very highest redshifts, 
where very few galaxies have been found, objects selected on the basis of their 
ultraviolet emission are also included. The subsets of galaxies included in each 
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figure overlap considerably, but are not identical because not all of the requisite 
information is available for each source. 

Dust mass. Mass estimates are unmodified from literature values , owing 
to the heterogeneity of the data available across the sample. The dust masses are 
generally derived from the far-infrared continuum emission, using one to several 
wavelengths. Differences between the cosmology assumed here and previously 
result in unimportant corrections and are ignored. 

Gas mass. Following standard observational practice, the primary source for the 
gas masses?75°-98:61.63-67 shown in Fig. 2b is measurement of the luminosity of 
rotational transitions of CO. The lowest available rotational transition is typically 
used; any translation between the observed transition and the J= 1-0 line, which 
is most commonly used as a molecular gas indicator, is taken from the origi- 
nal source. Rather than accepting the varying coefficients for the conversion of 
CO luminosity to gas mass, we re-calculate all masses using a common value of 
aco = 1.0Mz (K kms"! pe?)~}, which is a typical value for actively star-forming 
galaxies, For one source the gas mass is estimated through the star-formation 
surface density”. 

Halo mass. The halo masses of Fig. 3 are derived from the gas mass sample above. 
Each halo mass is represented using a range of values, starting with a conservative 
and hard lower limit found by dividing the measured gas mass by the universal 
baryon fraction’® f, = 0.19. This lower limit ignores any baryonic mass that has 
been converted into stars or hot or cool atomic gas phases, which would increase 
the inferred halo mass. A more realistic, but still conservative, lower limit is repre- 
sented by the top of the plotted symbols in Fig. 3. Here we assume that the ratio of 
baryonic mass to halo mass is Mp/Mhalo = 0.05. This value is a factor of about four 
less than the universal baryon fraction but still higher than the typical stellar-to- 
halo mass ratio inferred for haloes of any mass and redshift via subhalo abundance 
matching*. Given that we do not expect high-mass galaxies such as SPT0311—58 
to expel a large fraction of their molecular gas content” or to later accrete dark 
matter without also accreting gas in proportion to the universal baryon fraction, it 
is reasonable to expect that the baryon-to-halo mass ratio should be less than this 
inferred upper limit on the stellar-to-halo mass ratio across all masses and redshifts. 
HEFLS3 and SPT0311—58 masses. For the two most distant DSFGs, HFLS34 and 
SPT0311—58, which have extensive far-infrared photometry and atomic and 
molecular line measurements, we also compute the gas mass using a joint 
continuum-line radiative transfer model®”’. The mass for SPT0311—58 has been 
computed previously® without spatially resolved (CO and [C 1]) line emission. 
For Fig. 3, only the total gas mass of the two SPT0311—58 sources is important for 
estimating the halo mass. For Fig. 2, the dust mass is divided between the two sources 
according to the ratio of dust continuum emission in our resolved observations. 
The gas mass is similarly divided, although the velocity profile of the CO lines 
provides weak evidence that the molecular gas is concentrated in SPT0311—58 
W, which would increase the gas mass for this source by 15%. 

Calculation of halo rareness. Figure 3 demonstrates the ‘rareness’ of SPT0311—58 
by considering its position in the dark-matter halo mass-redshift plane compared 
with other extreme high-redshift objects (DSFGs, QSOs and an LBG) that are 
believed to be hosted by massive dark-matter haloes. To quantify the rareness of 
these extreme objects we use a previously reported method”*, including a MATLAB 
script (https://bitbucket.org/itrharrison/hh13-cluster-rareness) that we modified 
slightly to extend the calculation to z= 10. This method enables us to compute 
(z, Mhalo) contours (‘exclusion curves’) above which the Poisson probability of such 
an object being detected in the standard ACDM cosmology is less than a < 1; the 
existence of a single object above such an exclusion curve is sufficient to rule out 
ACDM at the 100(1 — a)% confidence level. In Fig. 3, we plot 1o exclusion curves 
(a =0.32). Of the three different statistical measures of rareness proposed”*, we use 
the ‘>v’ measure, which quantifies the rareness according to the minimum height 
of the primordial density perturbation from which a halo of mass Mpalo and redshift 
z could have formed: v(Mhalo, Z)  [D(z)o(Mhalo)]~!, where D.,(z) is the nor- 
malized linear growth function and o7(Mhaio) is the variance of the matter power 
spectrum smoothed on the co-moving spatial scale that corresponds to the mass 
Mhato- This statistic is sensitive to changes in the ACDM initial conditions, such as 
primordial non-Gaussianity (which would lead to more high-mass dark-matter 
haloes at a given redshift than expected in the standard ACDM cosmology). For 
the purposes of this calculation, we assume a ACDM cosmology with parameters!” 
Qn = 0.309, 24 = 0.691, ho = 0.677 and og = 0.816 and use a previously reported 
halo mass function’'. 

The >v rareness statistic (and the corresponding exclusion curves) depends 
on the region of the Mpalo—z plane to which the survey is sensitive. We assume 
that the SPT sample of lensed DSFGs is complete for z> 1.5. At lower redshift, the 
probability of lensing is strongly suppressed*°””, which means that the galaxy (or 
galaxies) associated with a halo mass of more than about 10'5Mz (the Mpato value 
of the exclusion curves for z= 1.5) would have to have a very high intrinsic (that 
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is, unlensed) millimetre-wavelength flux density (more than about 20 mJy) to be 
included in the sample. Because of the effects of downsizing (that is, star forma- 
tion is terminated at higher redshift in higher-mass galaxies than in lower-mass 
galaxies), it is unlikely that massive galaxies at z < 1.5 would have sufficiently high 
infrared luminosity to be detected by the SPT”*. We furthermore assume that the 
survey is complete for Mz > 10''M.. The assumption that the sample is com- 
plete to Mpalo > 10!!Mq is a conservative one because the galaxies hosted by such 
haloes (which would have My 5 10''M.) are unlikely to be sufficiently luminous 
to be detected without being very strongly lensed (1 > 10); erring on the side of 
overestimating the completeness yields a lower limit on the rareness. Substituting 
a minimum halo mass of, for example, 107M. would make the value of the >v 
rareness statistic less than that found for 10!'M,; that is, SPT0311—58 would be 
inferred to be even rarer. 

The total area from which the SPT DSFG sample was selected is 2,500 deg”. 
However, the fact that most of the SPT DSFGs are strongly lensed implies that the 
effective survey area is potentially much less than 2,500 deg” because not only must 
a galaxy have a high intrinsic millimetre-wavelength flux density to be included 
in the sample but it also must be gravitationally lensed so that it exceeds the 
approximately 20-mJy threshold for inclusion in redshift follow-up observations. 
Properly accounting for the effects of lensing on the sample completeness would 
require defining an effective survey area as a function of halo mass and redshift: 
Aett(Mhalos Z) = 2,500 deg” x P(fimin | Malo» 2)» where P(1 | Mhalos Z) is the probability 
of a galaxy hosted by a halo of mass Mpato at redshift z being lensed by a factor /imins 
the minimum magnification necessary for a halo of mass Mpalo and redshift z to be 
detectable. However, given the large uncertainties in determining such a function, 
we opt for a simpler approach. Instead, in Fig. 3 we plot exclusion curves for the full 
sky (dotted line), for an area of 2,500 deg? (dashed line), which corresponds to the 
assumption that all haloes in the mass and redshift range specified above would 
be detected even if they were not lensed, and for an area of 25 deg? (solid line), 
which corresponds to the assumption that the survey area corresponds to only the 
approximately 1% of the SPT fields over which the magnification for sources at 
Z> 1.5 will be at least?” 1 =2, such as SPTO311—58. 

Code availability. The lensing reconstruction for the ALMA data was initially 
performed using the visilens code (https://github.com/jspilker/visilens). Pixelated 
reconstructions were performed using a proprietary code developed by a subset 
of the authors and additional non-authors, and we opt not to release this code in 
connection with this work. The rareness calculation was performed using publicly 
available code (https://bitbucket.org/itrharrison/hh13-cluster-rareness). The image 
de-blending for the Spitzer images used GALFIT (https://users.obs.carnegiesci- 
ence.edu/peng/work/galfit/galfit- html). The SED modelling used the CIGALE 
code (https://cigale.lam.fr/), version 0.11.0. The photometric redshift of the lens 
galaxy was estimated using EAZY (https://github.com/gbrammer/eazy-photoz). 
Data availability. This paper makes use of the following ALMA data: ADS/ 
JAO.ALMA#2016.1.01293.S and ADS/JAO.ALMA#2015.1.00504.S, available at 
http://almascience.org/aq?projectcode=2015.1.00504.S and http://almascience. 
org/aq?projectcode=2016.1.01293.S. The HST data are available online at the 
Mikulski Archive for Space Telescopes (MAST; https://archive.stsci.edu) under 
proposal ID 14740. Datasets analysed here are available from the corresponding 
author on reasonable request. 
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Extended Data Figure 1 | ALMA continuum images of SPT0311—58. worse than in b-d, and the displayed field of view is also larger by a factor 
a-d, Continuum images in ALMA bands 3 (a), 6 (b), 7 (c) and 8 (d), of four. Contours at 10%, 30% and 90% of the image peak in band 6 are 
corresponding to rest-frame wavelengths of 380 1m, 160|1m, 110j1m and shown in a for scale. The ALMA synthesized beam (full-width at half- 
90 um, respectively. Note that the resolution in a is a factor of roughly ten maximum) is represented as a hatched ellipse in the corner of each image. 
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Extended Data Figure 2 | Infrared and optical imaging of SPT0311—58. 8” x 8” thumbnails of SPT0311—58 in the observed optical and infrared filters 
are shown. ALMA band 6 continuum contours at 30% and 4% of the image peak are shown in blue; the ALMA synthesized beam is depicted as a blue 


ellipse in the corner of each image. 
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Extended Data Figure 3 | Optical, infrared and millimetre-wavelength HST/ACS F606W and F775W filters; blue). For emission from z= 6.9, no 
image of SPT0311—58. The field around SPT0311—58 is shown, as seen emission should be visible in the ACS filters owing to the opacity of the 
with ALMA and HST at 1.3mm (ALMA band 6; red), 1,300 nm (combined neutral intergalactic medium, whereas the other filters correspond to rest- 
HST/WFC3 F125W and F160W filters; green) and 700 nm (combined frame 160 nm and 160\.m. 
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Extended Data Figure 5 | Gravitational lensing model of the dust 
continuum emission in SPT0311—58. For each continuum wavelength 
for which we have suitable data, we reconstruct the source-plane emission 
as described in Methods section ‘Gravitational lens modelling. For each 
wavelength, from left to right, we show the ‘dirty’ (not de-convolved) 
image of the data, the dirty image of the model, the model residuals and 
the source-plane reconstruction. Because the images of the data are not 
de-convolved, the structure far from the object is due to side lobes in the 
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synthesized beam, and should be reproduced by the models. The image- 
plane region modelled is evident in the residuals, and results in the ‘noise’ 
in the source-plane reconstructions. Contours in the residual panels are 
drawn in steps of +20. The lensing caustics are shown in each source- 
plane panel (ellipse and diamond). The lens parameters are determined 
independently at 901m and 160 1m; at 110 |1m we adopt the parameters of 
the 160-1m model. 
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Extended Data Figure 6 | Gravitational lensing model of the [C 1] line to the rest-frame 160-jzm (ALMA band 6) continuum data (Methods 
in SPT0311—58. For each channel (40 km s~! wide), we reconstruct the section ‘Gravitational lens modelling’). The four images for each channel 
source-plane emission using the lens parameters determined from fitting are as in Extended Data Fig. 5. 
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Extended Data Figure 7 | Optical to submillimetre-wavelength SED determined using the CIGALE SED modelling code. The lens is modelled 
modelling for SPT0311—58 E, SPT0311—58 W and the lens galaxy. assuming a redshift of Zpnot = 1.43, as estimated with the photometric 
The photometric data in Extended Data Tables 2 and 3 for the three redshift code EAZY. Upper limits are shown at the 1c threshold and error 
components at the position of SPT0311—58 are compared to the models bars represent lo uncertainties. 
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Extended Data Table 1 | ALMA observations 


Date Frequency® Antennas Resolution Flux Phase PwvP to, ~~ Noise Level 
(GHz) (arcsec) Calibrator Calibrator (mm) (min) (uJy/beam) 

B3 3.3x3.5 35 

2016-Jan-02 91.95 4 3.8x3.9 Uranus J0303-6211 1.8 12 65 

2015-Dec-28 95.69 34 3.2x3.5 Uranus J0309-6058 2.9 1.2 83 

2015-Dec-28 99.44 34 3.1x3.4 Uranus J0309-6058 2.8 1.2 77 

2015-Dec-28 103.19 34 3.0x3.4 Uranus J0309-6058 21 1.5 72 

2015-Dec-28 106.94 34 2.9x3.3 Uranus J0309-6058 2.8 1.0 95 

B6 

2016-Nov-03 233.65 45 0.25x0.30 J0334-4008 J0303-6211 0.5 32.4 24 

B7 

2016-Jun-04 343.48 4 0.31x0.49 J2258-2758  J0303-6211 0.8 6.5 12 

B8 0.20x0.30 53 

2016-Nov-15 423.63 41 J0538-4405 J0253-5441 0.8 11.4 

2016-Nov-16 423.63 42 J0538-4405  J0253-5441 0.5 33.7 

2016-Nov-16 423.63 42 J0538-4405 J0253-5441 0.4 33.7 

2016-Nov-17 423.63 43 J0538-4405  J0253-5441 0.3 33.7 


First local oscillator frequency. 


Precipitable water vapour (PWV) at the zenith. 


°On-source integration time tint. 
4Root-mean-square noise level in the 7.5-GHz continuum image. 
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Extended Data Table 2 | Optical and infrared photometry 


Telescope __ Instrument/Filter Lens SPT0311-58E SPT0311-58W 

HST ACS/F606W > 27.05 >28.11 >27.08 

HST ACS/F775W > 26.55 >27.59 >26.63 

Gemini GMOS/i’ 25.00+0.20 

Gemini GMOS/z’ 24.40+0.20 

HST WFC3/F125W 23.06+0.16 25.28+0.10 >26.69 

HST WFC3/F160W 22.76£0.15 24.98+0.12 >27.11 

Gemini FLAMINGOS/K 2.16 um  22.42+0.13 as is 

Spitzer IRAC/Ch1 3.6 wm 21.40+0.14 24.47+0.30 (23.8740.28) 

Spitzer IRAC/Ch2 4.5 um 21.6340.13  24.45+0.25 (23.63+40.22) 
All data is given in apparent (not corrected for magnification) AB magnitudes. Limiting magnitudes are reported as lo values. The magnification estimates for the E and W sources are 1.3 and 2.1, 
respectively, as reported in Methods section ‘Gravitational lens modelling’. IRAC photometry for SPTO311—58 W is uncertain owing to blending with the lens, as noted in Methods section ‘Image 


de-blending’. 
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Extended Data Table 3 | Far-infrared photometry 


Herschel/SPIRE® 
Herschel/SPIRE® 
Herschel/SPIRE® 
ALMA/B8& 
ALMA/B7 
ALMA/B6 


Telescope Observed Wavelength  S, (eastintrinsic) |S, (westintrinsic) S_ (total apparent) 

250 um 1.9+0.6 12.744.2 29.0 + 8.0 
350 um 2.5+0.5 16.642.9 38.0 + 6.0 
500 xm 3.5+0.6 22.74+4.2 52.0 + 8.0 
710 pm 3.1 + 0.2 19.9+0.3 
869 um 2.9+ 0.2 15.9 + 0.25 
1.26mm 1.18 + 0.05 9.77 +0.15 

3mm 0.040 + 0.028 0.76 + 0.02 


ALMA/B3 


Flux densities (Sv) are given in mJy. 


@Herschel photometry does not spatially resolve the two components; see Methods section ‘Gravitational lens modelling’ for details. 
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Exploring 4D quantum Hall physics with a 2D 


topological charge pump 


Michael Lohse!, Christian Schweizer!?, Hannah M. Price*-+, Oded Zilberberg® & Immanuel Bloch!? 


The discovery of topological states of matter has greatly improved 
our understanding of phase transitions in physical systems. Instead 
of being described by local order parameters, topological phases are 
described by global topological invariants and are therefore robust 
against perturbations. A prominent example is the two-dimensional 
(2D) integer quantum Hall effect’: it is characterized by the first 
Chern number, which manifests in the quantized Hall response 
that is induced by an external electric field”. Generalizing the 
quantum Hall effect to four-dimensional (4D) systems leads to the 
appearance of an additional quantized Hall response, but one that is 
nonlinear and described by a 4D topological invariant—the second 
Chern number**. Here we report the observation of a bulk response 
with intrinsic 4D topology and demonstrate its quantization by 
measuring the associated second Chern number. By implementing 
a 2D topological charge pump using ultracold bosonic atoms in an 
angled optical superlattice, we realize a dynamical version of the 4D 
integer quantum Hall effect®®. Using a small cloud of atoms asa local 
probe, we fully characterize the nonlinear response of the system via 
in situ imaging and site-resolved band mapping. Our findings pave 
the way to experimentally probing higher-dimensional quantum 
Hall systems, in which additional strongly correlated topological 
phases, exotic collective excitations and boundary phenomena such 
as isolated Weyl fermions are predicted‘. 

Topology, originally a branch of mathematics, has become an impor- 
tant concept in different fields of physics, including particle physics’, 
solid-state physics® and quantum computation’. In this context, a 
hallmark achievement was the discovery of the 2D integer quantum 
Hall effect. This discovery demonstrated that the Hall conductance 
in a perpendicular magnetic field and in response to an electric field 
Eis quantized. Ina cylindrical geometry, following Laughlin’s thought 
experiment, E can be generated by varying the time-dependant mag- 
netic flux ¢,(t) along the axis (x) of the cylinder!° (Fig. 1a). The inter- 
play between the perpendicular magnetic field and the induced electric 
field E, creates a quantized Hall response in the x direction: an integer 
number of particles, determined by the first Chern number, is trans- 
ported between the edges per quantum of magnetic flux that is threaded 
through the cylinder’. 

Dimensionality is crucial for topological phases and many intrigu- 
ing states were recently discovered in three dimensions, such as Weyl 
semimetals!!»!? and three-dimensional (3D) topological insulators!°. 
Ascending further in dimensions, a 4D generalization of the quan- 
tum Hall effect has been proposed in the context of astrophysics’ and 
condensed-matter systems’, and has received much attention in theo- 
retical studies®. Unlike its 2D equivalent, the 4D quantum Hall effect 
can occur in systems with and without time-reversal symmetry**, The 
former constitutes the fundamental model from which many low- 
er-dimensional time-reversal-symmetric topological insulators can 
be derived*"*, Furthermore, a 4D quantum Hall system might exhibit 
relativistic collective hyperedge excitations and new strongly corre- 
lated quantum Hall phases, revealing the interplay between quantum 
correlations and dimensionality*. This interest was renewed recently as 


a result of the unprecedented control and flexibility made possible by 
engineered systems such as ultracold atoms and photonics. Such sys- 
tems have been used to study various topological effects'*'%, including 
a measurement of the second Chern number in an artificially generated 
parameter space!’, and offer a direct route for realizing 4D physics 
using synthetic dimensions'**”, 

In the simplest case, a 4D quantum Hall system can be composed of 
two 2D quantum Hall systems in orthogonal subspaces (Fig. la, b). In 
addition to the quantized linear response that underlies the 2D quan- 
tum Hall effect, a 4D quantum Hall system exhibits a quantized non- 
linear 4D Hall response’. The latter arises when—simultaneously with 
the perturbing electric field E—a magnetic perturbation B is added. 
The 4D geometry implies multiple possibilities for the orientation of E 
and B; however, the resulting nonlinear response is always character- 
ized by the same 4D topological invariant, the second Chern number. 
Here, we focus on the geometry depicted in Fig. 1a, b, in which the 
nonlinear response can be understood semi-classically as originat- 
ing from a Lorentz force created by B, which couples the motion in 
the two 2D quantum Hall systems”!. The direction of this response is 
transverse to both perturbing fields. Hence, it can occur only in four 
or more dimensions and has therefore never been observed in any 
physical system. 

Topological charge pumps exhibit topological transport properties 
that are similar to higher-dimensional quantum Hall systems and pro- 
vide a way to probe 4D quantum Hall physics in lower-dimensional 
dynamical systems. The first example of a topological charge pump 
was the one-dimensional (1D) Thouless pump®, in which an adiaba- 
tic periodic modulation generates a quantized particle transport. This 
modulation can be parameterized by a pump parameter and, at each 
point in the cycle, the 1D system constitutes a Fourier component of 
a 2D quantum Hall system”. The induced motion is thus equiva- 
lent to the linear Hall response and is characterized by the same 2D 
topological invariant, the first Chern number. Indeed, the quantum 
Hall effect on a cylinder can be mapped toa 1D charge pump with the 
threaded magnetic flux ¢, acting as the pump parameter! (Fig. 1a). 
Building on early condensed-matter experiments”, topological charge 
pumps have recently been realized in photonic waveguides** and by 
using ultracold atoms”>”®, 

A dynamical 4D quantum Hall effect can accordingly be realized by 
using a 2D topological charge pump*. Using dimensional reduction”, 
the Fourier components of a 4D quantum Hall system can be mapped 
onto a 2D system. For the geometry in Fig. la, b, the corresponding 2D 
model is a square superlattice (Fig. 1c, Methods), which consists of two 
1D superlattices along the x and y directions, each formed by superim- 
posing two lattices: V,,ysin’(wp0/ ds.) + Vi psin’ (rye dy, — y;,/2), with 
ju € {x, y}. Here, d,,,, and dj,,, > ds,,, denote the period of the short and 
long lattices, respectively, and V,,,, and Vj, the depths of the short and 
long lattice potentials. The position of the long lattice is determined by 
the corresponding superlattice phase ¢,,. 

The phase y, is chosen as the pump parameter; that is, pumping is 
performed by moving the long lattice along x. This method of pumping 


lFakultat fiir Physik, Ludwig-Maximilians-Universitat, SchellingstraBe 4, 80799 Miinchen, Germany. @Max-Planck-Institut ftir Quantenoptik, Hans-Kopfermann-StraBe 1, 85748 Garching, Germany. 
3|NO-CNR BEC Center and Dipartimento di Fisica, Universita di Trento, Via Sommarive 14, 38123 Povo, Italy. “School of Physics and Astronomy, University of Birmingham, Edgbaston, Birmingham 
B15 2TT, UK. ‘Institut fiir Theoretische Physik, ETH Zurich, Wolfgang-Pauli-StraBe 27, 8093 Ziirich, Switzerland. 
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Figure 1 | 4D quantum Hall system and the corresponding 2D 
topological charge pump. a, A 2D quantum Hall system on a cylinder 
pierced by a uniform magnetic flux ®,, (blue arrows). Threading a 
magnetic flux ¢,(t) through the cylinder creates an electric field E, on 

the surface (red arrows), resulting in a linear Hall response along x with 
velocity v, (green arrow). b, A 4D quantum Hall system can be composed 
of two 2D quantum Hall systems in the x-z and y-w planes. A weak 
magnetic perturbation B,,, in the x-w plane couples the two systems 

and generates a Lorentz force F,, (orange arrow) for particles moving 
along x. This force induces an additional nonlinear Hall response in the 

y direction with velocity v, (green arrow). c, A dynamical version of the 
4D quantum Hall system can be realized by using a topological charge 
pump ina 2D superlattice (blue potentials). Such a superlattice is created 
by superimposing two lattices with periods d, (grey) and d > d, (red) along 
both x and y, depicted here for d = 2d,, as in the experiment. The black 
circles show the lattice sites that are formed by the potential minima, and 
the black (grey) lines indicate strong (weak) tunnel coupling between 
neighbouring sites. The system is modulated periodically by moving the 
long lattice adiabatically along x, mimicking the perturbing electric 

field E, in the 4D model. The magnetic perturbation B,,, maps onto a small 
tilt angle @ of the long lattice along y with respect to the corresponding 
short lattice. In this case, the shape of the double wells along y depends on 
the position along x. The dashed red lines indicate the potential minima 
of the tilted long lattice. d, The pumping shifts the cloud of atoms (grey) 
in the x direction (with velocity v,), as per the quantized linear response 
of a 2D quantum Hall system. For non-zero 0, the two orthogonal axes are 
coupled, leading to an additional quantized nonlinear response with 4D 
topology in the perpendicular y direction (with velocity v,). e, The velocity 
of the nonlinear response is determined by the product of the Berry 
curvatures (2*(2” (see Methods; a.u., arbitrary units), depicted here for the 
lowest subband with dj = 2d, and lattice depths as in Fig. 3. The left (right) 
torus shows a cut at ky =0, py = 1/2 (kx = W/(2d)), (px = 1/2) through the 
generalized 4D Brillouin zone spanned by k,, yy, ky and yy. 


is equivalent to threading the flux ¢, in the 4D model, leading to a 
quantized motion along x (the linear response; Fig. 1c, d). The magnetic 
perturbation B,,, corresponds to a transverse phase yy, that depends 
linearly on x and thereby couples the motion in the x and y directions 
(see Methods). We realize this by tilting the long y lattice relative to the 
short one by an angle @ < 1 (Fig. 1c) such that y (x) = p© + 2nOx/diy 
to first order in 6. When ¢, is varied, the motion along x changes (py 
and—analogously to the Lorentz force in 4D—induces a quantized 
nonlinear response along y, which is equivalent to the nonlinear Hall 
response of a 4D quantum Hall system® (Fig. 1c, d). 
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For a uniformly populated band in an infinite system, the centre-of- 
mass (COM) displacement during one cycle ~,=0 — 27 is 


a 
Vj dy €y +20 
Ly 


ayey 


with a, (ay) the size of the superlattice unit cell and e, (e,) the unit 
vector along x (y) (see Methods). The first term describes the quantized 
linear response along x. It is proportional to the first Chern number of 
the pump (vj; denoted v in ref. 31), which is obtained by integrating 
the Berry curvature 


2* (ky Q) = i((Op,u|Ox,) — (Ox,ulOp,u)) 


over the generalized 2D Brillouin zone spanned by the quasi- 
momentum k, and (py. Here, |u(k, ?x)) denotes the eigenstate of a given 
non-degenerate band at k, and y,. Because vj can take only integer 
values, the motion is quantized?° . The second term is the nonlinear 
response in the y direction. It is quantified by a 4D integer topological 
invariant, the second Chern number of the pump (denoted Vin ref. 31): 


1 x 
n= f QW dk. dkydy,dy, 


where BZ indicates the generalized 4D Brillouin zone (Fig. le). 
Therefore, the nonlinear response is also quantized and has intrinsic 
4D symmetries that result from the higher-dimensional non-commu- 
tative geometry. 

We implement a 2D topological charge pump by using bosonic 
87Rb atoms that form a Mott insulator in isolated planes of a 3D opti- 
cal lattice with superlattices along x and y, with d, = d,.=dsy and 
d, = d),.=d,y=2d, (see Methods), creating double-well potentials along 
xand y (Fig. 1c). In the tight-binding limit, this implementation realizes 
a 2D Rice-Mele model”’ in each plane with dimerized on-site energies 
and tunnel couplings between neighbouring sites in both directions 
(see Methods). The corresponding unit cell is a four-site plaquette, 
x = dy=2d,, and the lowest band splits into four subbands. 

In the experiment, we study the nonlinear response of the lowest 
subband, for which 1, = +1 for d|=2d,. Our main results are: (i) the 
observation of a 4D-like bulk response; (ii) the local probing of its 4D 
geometric properties; and (iii) the revealing of the 4D quantum Hall 
effect by demonstrating the quantization of the response. As the initial 
state, a quarter-filled Mott insulator that uniformly occupies the lowest 
subband is prepared at y,=0 (see Methods). The pumping is per- 
formed along x by adiabatically varying y,; we examine the resulting 
motion of the atoms. We probe the system locally by using a small cloud 
of atoms that extends over approximately 20 sites in the x direction. In 
this case, the variation in (2”(yy) over the cloud is negligible and the 
y displacement per cycle is given by Q(p A 6a,ay/d) with 


= 1 < 
=> f @Mdk.dkydy, 


(see Methods). From this local response, the quantized nonlinear 
response of an infinite system can be reconstructed by sampling all 
gy € [0, 27), thereby integrating over the entire 4D Brillouin zone. 


To probe the motion of the cloud, we measure its COM position as a 
function of (y,. Because the nonlinear response results from two weak 
perturbations, the displacement per cycle is typically only a fraction of 
d). It is therefore too small to be resolved experimentally, because the 
number of experimental cycles is limited by heating. However, for 
suitable lattice parameters, signatures of the nonlinear drift—the key 
feature of the 4D Hall effect—can be seen at ey = = 1/2 (Fig. 2), at which 


2 is strongly peaked (see Fig. le). Unlike the linear response, this 
motion depends on 6, demonstrating the intrinsically 4D character of 
the nonlinear response, which results from the two independent 
perturbations in orthogonal subspaces. This result demonstrates the 
existence of this dynamical, transverse, bulk phenomenon directly. 
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Figure 2 | 4D-like nonlinear centre-of-mass (COM) response. a, Shift in 
the COM of the cloud of atoms along y (Ay) versus the number of pump 
cycles along x (represented by y,) measured for two different angles, 

0; =0.78(2) mrad (red) and 63 = —0.85(2) mrad (blue), with 

gy = 0.500(5)x. When pumping along x, the cloud moves in the 
perpendicular y direction with the sign depending on the pumping 
direction and the sign of 0. Ay is the differential displacement for 

Vee =7.0(2)E xs Voy = 17.0(5)Exs Vix = 20.0(6)E,1 and Viy = 80(3)Ex1 
compared to a reference sequence with V,, = 40(1)E,. and V,,,=0E;, 
(see Methods). Here E,,;= h?/(8mad;), with i € {s, 1}, denotes the 
corresponding recoil energy, with m, the mass of an atom. Each point is 
averaged 100 times and the error bar takes into account the error of the 
mean as well as a systematic uncertainty of +0.3d,. b, Difference in the 
COM drift between 0, and 6, for the x (grey) and y (green) directions: 
Ar, = Ap(01) — Ap(62), with ps € {x, y}. The direction of the nonlinear 
response reverses when changing the sign of 0, whereas the linear 
response is independent of 0. Data are calculated from the measurements 
in a (see Methods). 


To quantify this nonlinear response, instead of in situ imaging we use 
site-resolved band mapping, which measures the number of atoms on 
even (N.) and odd (N,) sites along y. This method enables us to deter- 
mine the average double-well imbalance, Z,=(No — Ne)/(No+Ne); 
accurately. If no transitions between neighbouring unit cells along 
y occur, then 7, is related directly to the COM motion (see Methods). 
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An example measurement of Z,(y,) is shown in Fig. 3a. The measured 
nonlinear response is smaller than expected for an ideal system, owing 
to the appearance of doubly occupied plaquettes and band excitations 
along y during the pumping and to a finite pumping efficiency along x 
(see Methods). Taking these imperfections into account, we find excel- 
lent agreement between the experimental data and the expected imbal- 
ance (Fig. 3a). By performing a linear fit to the differential 
double-well imbalance Z,((p,) — Z,(—,), we extract the change in the 
population imbalance during one cycle, 6Z,=7Z,((p, =2%) — D,(Y,=0) 
(see Methods). For a homogeneously populated band, this slope is 
determined by ? and thus characterizes the transport properties of the 
system. 

To reconstruct the quantized response of an infinite system and 
thereby obtain 17, we repeat the measurement of Z,(p,) for different 
gy starting from the same initial position. This is equivalent to using 
the small cloud of atoms as a local probe at different positions along x 
for fixed p” (Fig. 3b). To demonstrate the quantization of the non- 
linear response, we determine the second Chern number of the lowest 
subband by averaging 6Z, over ot € [0, 27). For symmetry reasons, it 


is sufficient to restrict yp to [0, 7) for dj =2d, (see Methods). In this 
interval, the nonlinear response has large contributions only in the 
vicinity of y= 1/2. For the range of data shown in Fig. 3c, this 


process gives v5? = 0.8(2), with the error resulting from the fit and the 
uncertainty in 0. By taking the above-mentioned experimental imper- 
fections into account we isolate the contribution from the lowest sub- 
band 82} (see Methods). The experimentally determined slope of the 
nonlinear response for ground-state atoms agrees very well with the 
slope expected in an ideal system (Fig. 3c). To determine v5", the ideal 
slope is fitted to the measured one by scaling it with a global amplitude, 
(VSP /v)6L By). This yields v5? = 1.07(8), in agreement with the 
expected value of v)=-+1. The error in here additionally takes into 
account the uncertainties in the lattice depths. 

In the 4D quantum Hall system, the defining feature of the nonlinear 
response is its linear dependence on the magnetic perturbation. The 
same scaling is thus expected for the 2D charge pump with respect to 0. 
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Figure 3 | Local probing of the quantized nonlinear bulk response for 
0=0.54(3) mrad. a, Double-well imbalance Tt, versus the number of 
pump cycles in the x direction at ye = 0.500(5) 1, Vs.x= Vs,y=7.0(2)Exs 
and Vix = Vi, =20.0(6)E,). The data are the average of 14 measurements 
for the point at ~,=0 and 7 measurements for all others; the error is the 
error of the mean. The dashed line shows the response of an ideal system; 
the solid line includes corrections for the finite pumping efficiency along x 
and for the creation of doubly occupied plaquettes and band excitations 
along y. Both curves are shifted by a constant offset Zp) = 0.002 (see 
Methods). For simplicity, the theoretical curves assume a homogeneous 
Berry curvature 2* = v{a,/ (27), neglecting the variation in (* during a 
pump cycle. b, The response of an infinite system can be reconstructed 
with a small cloud of atoms by repeating the measurement from a for 
different values of gy. A single measurement probes the response locally 
at the position of the cloud (grey frames on the left). Changing yy is 


0.4 0.5 


p(n) 


equivalent to sampling a different position in the lattice (magnified frames 
on the right). Note that the tilt of the long y lattice (indicated by the red 
solid line, as in Fig. 1c) is greatly exaggerated compared to the angle used 
in the experiment. c, Change in the double-well imbalance per cycle for 
the lowest band (67 =) as a function of p. &L , is determined by the 
integrated Berry curvature (| (pe) and so exhibits a pronounced peak 
around gy = 1/2 (see Fig. le and Methods). The slope SZ is extracted 
from a fit to the measured imbalance 7,((x) (see Methods) and the solid 


line is the theoretically expected slope. Error bars show the fit error and 
the blue-shaded region indicates the uncertainty of the theoretical curve 
that results from the errors in @ and the lattice depths. The insets show two 
additional examples of individual measurements of Z,(y,) (for the values 


of gy indicated by the grey shading), as in a. 
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Figure 4 | Scaling of the 4D-like response with the tilt angle 0. The linear 
dependence on @ reveals the nonlinear character of the response, 
demonstrating that it is induced by two independent perturbations, Op,/Ot 
and 0. The slope sz is determined as a function of 6 at gy = 0.500(5)x 
by measuring the double-well imbalance when pumping along x, as 
described in Fig. 3 and using the same lattice depths. The solid line shows 
the slope that is expected for an ideal system. The fit errors for 6Z ® are 
smaller than the size of the data points and the insets show two examples 
of the measurement of Z,(p,) (for the values of 6 indicated by the grey 
shading), as in Fig. 3a. 


We verify this by measuring the peak slope 6 $ at gy =7/2asa func- 
tion of 6 (Fig. 4). Doing so also provides another way of obtaining the 
second Chern number, by determining the slope of 5Z5*(9) (see 
Methods). This linear fit gives v5‘? = 1.01(8), where the error is deter- 
mined as described above. Furthermore, we confirm that the peak slope 
at fixed 6 scales with the depth of the short y lattice V,., as expected 
(Extended Data Fig. 1, Methods). In particular, the direction of the 
nonlinear response is independent of V,, indicating its robustness 
against perturbations of the system. 

In conclusion, we present an observation of a dynamical 4D quan- 
tum Hall effect, opening up a route to studying higher-dimensional 
quantum Hall physics experimentally. Extending our work, additional 
density-type nonlinear responses that are implied by the intrinsic 
4D symmetry of a 2D charge pump can be measured°. By adding a 
spin-dependent Yang-Mills gauge field, a dynamical version of the 
time-reversal-symmetric 4D quantum Hall effect, which exhibits a 
ground state with SO(5) symmetry, could be realized®. Including inter- 
actions may yield intriguing fractional phases that originate in the 4D 
fractional quantum Hall effect’, similarly to previous proposals for 1D 
charge pumps”*, and might enable the study of open questions in the 
context of Floquet engineering'*. Going beyond the limit of weak per- 
turbations, quantized electric quadrupole moments could be observed 
in spatially frustrated systems with 0 = 1/4 (ref. 29). Furthermore, a 
quantum Hall system with four extended dimensions might be realized 
with cold atoms” using recently demonstrated techniques for creat- 
ing synthetic dimensions'*". In finite systems, this would permit the 
observation of boundary phenomena such as isolated Weyl points*”. 
Ultimately, the ability to experimentally realize 4D quantum Hall 
systems could provide insight into lattice quantum chromodynamics 
models based on the Yang-Mills theory’, and even quantum gravity’. 

We note that, simultaneously with this work, complementary results 
on topological edge states in 2D photonic pumps have been obtained*". 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 


Hall response of the 4D quantum Hall system. Assuming perfect adiabaticity, 
the Hall response of the 4D system shown in Fig. 1a, b can be evaluated from the 
semi-classical equations of motion for a wave packet centred at position r and 
quasi-momentum k (ref. 32) 


1 OE(k) 
h Okp 


hk, = gE, + qr’ Bw 


p= + k,.Q"(k) 


Here, E(k) is the energy of the respective eigenstate at k, q is the charge of the 
particle and Einstein notation is used for the spatial indices ju, v € {w, x, y, z}. The 
orientation of the axes in Fig. 1a, b is chosen such that the 4D Levi-Civita symbol 
is Eyxyz = +1. The velocity of the wave packet v = 7 has two contributions: the group 
velocity, which arises from the dispersion of the band, and the anomalous velocity, 
which is due to the non-zero Berry curvature 


Q’"(k) = i((Ox,ulOk,U) — (Ok, U|Ok,u)) 


For a filled or homogeneously populated band, the group velocity term 
vanishes and with E= E,e, and B=0 the linear Hall response is given by the 
COM velocity 


Vous = FANE. VT ex 


where Ax; denotes the size of the magnetic unit cell in the x-z plane and 


v= Q* dk 
on 
BZ 


denotes the first Chern number of the 2D quantum Hall system in the x-z plane. 
The integration is performed over the 2D Brillouin zone (BZ) spanned by k, 
and k,. 

Adding the perturbing magnetic field B,,, generates a Lorentz force that acts on 
the moving cloud®, hk = qe ze: — qv Byye,y. (This additional force can alterna- 
tively be interpreted as arising from a Hall voltage in the w direction that is created 
by the current along x in the presence of B,,,.) This force in turn induces an addi- 
tional anomalous velocity along y, giving rise to the nonlinear Hall response. The 
resulting average velocity is then 


2 
V¥COM = Fan Eaviex - [4] AmE-BuyV/2€y 


with Ay the size of the 4D magnetic unit cell. The second Chern number is given by 


1 
V2z>= — 


5 P POF + DID" + DQM EK 
4n? 


BZ 


where BZ denotes the 4D Brillouin zone. 

Tight-binding Hamiltonian of the 2D superlattice. In the tight-binding limit, the 
motion of non-interacting atoms in a 2D superlattice is captured by the following 
Hamiltonian 


Foo(¥o 9) = 
— SS kle) + U2) 4 


+ 7 
int Lmymyymy + hic. 


MyyMy 
m 1 
= DY [+ 977] Fh ny t14 mem, + hic. ) 
MxyMy 
(Px) a AW G,)| ay mx, mylns, my 
MyyMy 


Here, a ai, my (m,, my ) is the creation (annihilation) operator acting on the (m,, m,)th 
site in the x-y plane. The first (second) term describes the hopping between neigh- 
bouring sites along the x axis (y axis), with tunnelling matrix elements J, + os 
with ju € {x, y}. The last term contains the on-site potential of each lattice site, 
Ay* + A’. In the presence of the long lattices, the tunnel couplings and on-site 
energies are modulated periodically by 67)" and A’’* + Ay” respectively. Both 


modulations depend on the respective superlattice phases Ow 


LETTER 


For the lattice configuration used in the experiment, where d,,,, = 2d,,,, these 
modulations can be expressed as (— 1)" / 2 and (—1)""A,,/2, and equation (1) 
reduces to the 2D Rice-Mele Hamiltonian” 


Fho(Y. 9,) = 
=D Pee) + 8G) /2| am maamy + bic. 


Mx,My 

= SF] + CI (G)/2] Bg my + 1dmaamy + bec. 
Mx,My 

+> a 1" Ax(Q) + I)" A(G)| emg minmny 
mx,My 


Mapping a 2D topological charge pump to a 4D quantum Hall system. The 
Hamiltonian of a 2D topological charge pump for a given set of parameters {(,, Yy}, 
Fon 2 oy)> can be interpreted as a Fourier component of a higher-dimensional 
quantum ‘Hall system. Using the approach of dimensional extension®”’, a 2D 
charge pump can be mapped onto a 4D quantum Hall system, the Fourier compo- 
nents of which are sampled sequentially during a pump cycle. This is demonstrated 
in the following for the deep tight-binding limit Y, ,, >> vi ih /(4Exs)> 6 € {x yp in 
which the corresponding 4D system consists of two 2D Harper-Hofstadter- 
Hatsugai models*?~** in the x-z and y-w planes. A similar analogy can be made in 
the opposite limit of vanishingly short lattices, V,. — 0 and V,,, — 0. In this case, 
each axis of the 2D lattice maps onto the Landau levels of a free particle in an 
external magnetic field in two dimensions”. For the lowest band, these two limit- 
ing cases are topologically equivalent; that is, they are connected by a smooth 
crossover without closing the gap to the first excited band. The topological invar- 
iants that govern the linear and nonlinear response are thus independent of the 
depth of the short lattices. 

In the deep tight-binding regime, J, and J, become independent of the super- 
lattice phases and the modulations can be approximated as*” 


ar*(y,) = Pxz(My + 1/2) = 


gy 
2 
(0) as 

Ty) = a cos[Pyy(my + 1/2) — ¥y] 

(0) 


AN(y,) = (Geng _ Qe) 


A 
Ay (Yy) =. cos(Pyyimy aa Y) 
with &,, = 2nd,,x/dj,,and Dy =2ndsy/d,y. gf” and a? denote the modulation 
amplitudes, which are determined by the lattice depths. In this case, Hyp is equiva- 
lent to the generalized 2D Harper model**, which describes the Fourier components 
of a 4D lattice model with two uniform magnetic fields in orthogonal subspaces. 
The 4D parent Hamiltonian is obtained via an inverse Fourier transform® 


20 
é 1 P 0 
Hap = Bl Relist )dy.dy' ‘ 
4x?“ y y 
with 
are -»> ellexmet oS mw) at 
* mz,My 
Amy,my = S. enilexmet oP mw) 4 


mz,mMy 


and where m= {m,, my, mz, my} indicates the position in the 4D lattice. This yields 


Ayp= Ae + Hy + Hay 


The first term (Hy,) describes a 2D Harper—Hofstadter model***° in the x-z plane 
with a uniform magnetic flux per unit cell, 6, = By h,, / (2) = (dsx/di,x )By with 
® denoting the magnetic flux quantum 


He== 5 Teh yeAm +h.c. 
m 


AO 
- > Sx eiPsemxat | a + hic. 
m 


4 m+ez' 


Correspondingly, the second term (A yw) is an independent 2D Harper-Hofstadter 
model in the y-w plane with ®,,, = (d;y/d),y)®p. Owing to the positional dependence 
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of the transverse superlattice phase yy, this term also contains the magnetic 
perturbation, that is, a weak homogeneous magnetic field in the x-w plane 


Ayw = — LL JyAnreamt hic. 
m 


AO _ es 
=e 2 el brnmyt Prov ndgt mn +h.c. 
m 


with ®,,, = — 2n6d,,x/d),y. The strength of the perturbing magnetic field is then 


where d,,,, is the lattice spacing along w. For 6] ” = 0, the third contribution (Hy) 
leads to the appearance of additional next-nearest-neighbour tunnel coupling ele- 
ments in the x-z and y-w planes, with amplitudes of 6J 0) /4 and §J . /A, respec- 
tively. The individual 2D models without the magnetic perturbation B,,, then 
correspond to the Harper-Hofstadter—Hatsugai model** with a uniform magnetic 
flux ®,, and @,,,, the same flux as for 6] . =0. 

Transport properties of a 2D topological charge pump. When the pump 
parameter ¢, is changed slowly, a particle that is initially in an eigenstate 
|ulk Yx(t=0), ky, Yy)) of the 2D superlattice Hamiltonian Fop (equation (1)) will 
adiabatically follow the corresponding instantaneous eigenstate 
|ulke x(t), ky, Yy)). In absence of a tilt (@= 0), the particle acquires an anomalous 
velocity (2°0,p,e, during this evolution, analogously to the linear Hall response in 
a quantum Hall system. In this case, the Berry curvature (* is defined in a 4D 
generalized Brillouin zone (ky, x, ky, (Py) 


2 (Kes Ger kyr Py) = (Op 4] Ok xu) — (Okx4|Opx4)) 


For a homogeneously populated band, the COM displacement along x during one 
cycle, obtained by integrating the average anomalous velocity over one period, 
can be expressed as an integral of the Berry curvature over the 2D generalized 
Brillouin zone spanned by k, and ¢,. It is therefore determined by the first Chern 
number of the pump 


sf = f 2 dkxdy, 


When a tilt is present (9 ~ 0), this motion along x leads to a change in ‘py. This 
induces an additional anomalous velocity in the y direction, giving rise to the 
nonlinear response. Neglecting the contribution from the group velocity (which 
averages to zero for a homogeneously populated band), we obtain for a given 
eigenstate 


20 
Vy(Kxs Pes ky, %,) = Dw, = a. QMDMOrp, (2) 


The distribution of (2*{2” in the 4D generalized Brillouin zone is shown in Fig. le 
for the lattice parameters used for the measurements in Figs 3 and 4. It exhibits 
pronounced peaks around (py € {1/2, 30/2} and yy€ {n/2, 30/2}. For d\=2d,, 
is X-periodic in both y, and ¢, because the corresponding eigenstates are related 
by a gauge transformation, owing to the translational symmetry of the superlattice 
potential*®. 

For a small cloud that homogeneously populates a single band, as in the experi- 
ment, the variation in (2*{2’ over the size of the cloud along x (L,) due to the 
position dependence of ¢, is negligible for L, < dj/@. The average velocity for 
the nonlinear response can then be calculated by averaging equation (2) over both 
quasi-momenta k, and k,. The COM displacement after a complete cycle can be 
determined by integrating the velocity over one period. We can thus express the 
change in the COM position per cycle as 


ax 
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If the number of pump cycles is small, then the change in yy as a result of the linear 
pumping response can be neglected and the nonlinear displacement per cycle is 
very well approximated by Sy.oy4 © Mey) Oaxay/dyy. 

The response of a large system with L, >> d),)/0 can be obtained by averaging 
equation (3) over yy(x) € [0, 27), yielding 
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where the second Chern number 1 is calculated by integrating (2*{2” over the 
entire 4D generalized Brillouin zone 


1 x 
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Note that to probe the intrinsic transport properties of the unperturbed system, 

both fields that generate the response have to be small perturbations such that the 
evolution remains adiabatic and the energy gap to the excited subbands remains 
open, which protects the topological invariants. Nonetheless, going beyond this 
limit can result in additional exciting phenomena. For example, a configuration 
with @ = 1/4 can lead to spatial frustration and the resulting model might enable 
the observation of quantized electric quadrupole moments similar to those 
proposed previously”. 
Pump path. Varying the pump parameter , periodically modulates the tight- 
binding parameters 6J,(,) and A,(p,) that describe the superlattice along x 
(equation (1)). For d;=2d,, the modulation of J, and A, is out of phase and the 
system therefore evolves along a closed trajectory in the 6J,-A, parameter space 
(Extended Data Fig. 2a). This pump path encircles the degeneracy point 
(5J,=0, A,=0), at which the two lowest subbands of the Rice—Mele model touch. 
This singularity can be interpreted as the source of the non-zero Berry curvature 
§X in the generalized Brillouin zone, which gives rise to the linear pumping 
response. All pump paths that encircle the degeneracy can be continuously trans- 
formed into one another without closing the gap to the first excited subband and 
are thus topologically equivalent with respect to the linear response; that is, the 
value of} does not change. 

Similarly, the tight-binding parameters 6J, and A, depend on the phase of the 
transverse superlattice yy. For a large cloud, all possible values of y,, and thus J, 
and Ay, are sampled simultaneously (Extended Data Fig. 2b). During a pump cycle, 
the system therefore traces out a closed surface in the 4D parameter space of 6J,, 
A,, 5J, and A, (Extended Data Fig. 2c). In this parameter space, the two lowest 
subbands touch in the two planes (6J,=0, Ay=0) and (8J,=0, Ay=0), which 
intersect at a single point at the origin (Extended Data Fig. 2d). Analogously to the 
linear response, this degeneracy generates the non-zero Berry curvatures {2* and 
§2’, which cause the nonlinear motion in the y direction. Owing to the 4D character 
of the parameter space, the 4D pump path can enclose the degeneracy (Extended 
Data Fig. 2e). Whenever this is the case, the topology of the cycle does not change 
and the value of 1 remains the same. 

To visualize the pump path in the 4D parameter space in Extended Data Fig. 2, 
we apply the following transformation 
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where the tight-binding parameters are normalized by their respective maximum 
values. The degeneracy planes are then given by 7) = rgand r)=fo, 
r3=14, respectively; that is, they become perpendicular planes in (rj, ro, 73) space. 
Lattice configuration. All experiments were performed in a mutually orthogonal 
retro-reflected 3D optical lattice consisting of superlattices along x and y and a 
simple lattice in the z direction. Each superlattice is created by superimposing two 
standing waves: a short lattice with wavelength ,= 767 nm and a long lattice with 
A= 2g. The vertical lattice along z is formed by a standing wave with \,=844nm. 
Initial state preparation for band-mapping measurements. For all sequences, a 
quarter-filled Mott insulator consisting of about 5,000 ®’Rb atoms was prepared 
with one atom localized in the ground state of each unit cell, creating a uniform 
occupation of the lowest subband in the 2D superlattice. To this end, a Bose- 
Einstein condensate was loaded from a crossed dipole trap into the lattice by first 
ramping up the blue-detuned short lattices along x and y to 3.0(1)E;,; over 50 ms 
to lower the initial density of the cloud of atoms. These lattices were then switched 
off again within 50 ms, while the vertical lattice and both long lattices were 
increased to 30(1)E,,, and 30(1)E,,, respectively, with :p,=0.000(5)n and ~,= yp. 
Subsequently, doubly occupied lattice sites were converted to singly occupied ories 
(see below), creating a Mott insulator with unit filling and a negligible fraction of 
doublons. Each lattice site was then split into a four-site plaquette by ramping up 
the short lattices along x and y to their final depth of 7.0(2)E,,, and decreasing the 
long lattices to 20.0(6)E,) over 5 ms. 

Removing doubly occupied sites. After preparing the Mott insulator with 
unit filling in the long lattices, sites containing two atoms were converted to 
singly occupied ones using microwave-dressed spin-changing collisions and a 
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resonant optical push-out pulse***”. For this, the lattice depths are increased to 
Vo = 70(2)Ers> Vix = 301) Ex) Viy=70(2)E,) and V,= 100(3)E,,z over 5ms to 
maximize the on-site interaction energy. The atoms, which were initially in the 
(F=1, mp=—1) hyperfine state, were converted to (F= 1, mp=0) by using an 
adiabatic radio-frequency transfer. Here, F denotes the total angular momentum of 
the atoms. By ramping a magnetic offset field in the presence of a microwave field, 
we performed a Landau—Zener sweep that adiabatically converted pairs of mp=0 
atoms on the same lattice site to an mp=+1 and an mp=—1 atom via coherent 
spin-changing collisions. The mp=—1 atoms were subsequently removed via an 
adiabatic microwave transfer to (F =2, mp=—2), which was followed by a resonant 
optical pulse after lowering the lattices to V5. =OE,s, Vix=30(1)E: Viy =40(1) Ext 
and V,=40(1)E,.z. 

Sequence for pumping. The superlattice phase can be controlled by slightly changing 
the frequency of the lasers used for generating the long lattices and thereby moving 
the relative position between the short and long lattices at the position of the atoms. 
The pumping along x is performed by slowly changing ¢,, starting from the 
staggered configuration at y,=0.000(5)r, in which the energy difference between 
neighbouring sites (|A,|) is largest and the tunnel couplings are equal (5J,=0). To 
minimize non-adiabatic transitions to higher bands, each pump cycle consists of 
three S-shaped ramps: y, € [0, 0.57], [0.57, 1.57] and [1.57, 27]. This reduces the 
ramp speed in the vicinity of the symmetric double-well configuration (A,=0) 
at y, = (1+ 1/2)n, with | € Z, at which the gap to the first excited band is smallest. 
The duration of the 1/2 ramps is 7 ms, and 14ms for the ramp by 7. Owing to 
the limited tuning range of a single laser, a second laser is required for imple- 
menting multiple pump cycles, which is set to a constant phase of (py = 0.000(5)t. 
At the end of each cycle, an instantaneous switch from the primary laser to the 
secondary one is made, and within 5 ms the frequency of the former is ramped 
back to its initial value, corresponding to an identical lattice configuration. After 
switching back to the first laser, the next cycle continues as described above. We 
checked experimentally that this handover between the two lasers does not create 
any measurable band excitations. 

Measuring the in situ position. To determine the nonlinear COM displacement 
along y, a double-differential measurement was conducted to minimize the effect 
of shot-to-shot fluctuations of the atom position. To do this, the COM position is 
measured before (y;) and after (yr) the pumping and compared to a reference 
sequence ( y, y). For the latter measurement, the pumping is performed with 
only the short lattice along y (at V,,=40(1)E,s); there is therefore no nonlinear 
response. The initial position is obtained during the doublon removal sequence, 
where the atoms are initially prepared in the (F= 1, mp=0) hyperfine state and 
one atom from each doubly occupied site is transferred to (F = 2, mp= —2) using 
microwave-dressed spin-changing collisions (see above). In addition, we transfer 
50% of the atoms on singly occupied sites to the F=2 manifold, by applying a 
microwave 7 pulse resonant on the (F= 1, mp=0) — (F=2, mp=0) transition. 
The F=2 atoms thus have the same density distribution as the remaining F= 1 
atoms and are imaged before the push-out pulse, which removes them from the 
lattice. The motion of the atoms due to the nonlinear response is then 
Ay=(%—-y,) — gy - ae The difference in the COM displacement along y 
between 0, and 0) is defined as Ar, = Ay(0,) — Ay(02). For the x direction, it is 
obtained from Ax = (xs — x;) — 6x directly without comparing it to the reference 
sequence. Here, dx is the average displacement of all data for a given angle, account- 
ing for a small constant offset between the measured initial and final positions. 
Relation between COM position and double-well imbalance. If there are no 
inter-double-well transitions along y, then the change in the double-well imbalance 
82, =Z(ex) — D(x = 0) can be related directly to the COM motion along y. The 
COM position in the y direction is 


Yoom = dy LG ~~ 1/4)Ne, i + Gj + 1/4)No,ij] 
ij 


where the sum is over all unit cells, Ne,j (No,j) is the occupation of the even (odd) 
sites along y in the (i, j)th unit cell and N is the total number of atoms. Expressing 
this in terms of the total number of atoms on even and odd sites, Ne = > j Ne,ij 
andN, => i Noi, and assuming that there are no transitions between neighbour- 
ing unit cells along y (that is, }°; (Ne,i + No, ij) remains constant), the change in the 
COM position can be written as 'ycom =Ycom(x) — Ycom(Yx = 0) = di6Z,/4. Note 
that this derivation implicitly assumes that the COM of the maximally localized 
Wannier functions on the lattice sites along y is independent of ,, which is a valid 
approximation deep in the tight-binding regime; otherwise, the proportionality 
factor d,/2 has to be replaced by the distance between the COM of the Wannier 
functions on the even and odd sites of a double well. 

Direct determination of the second Chern number. To determine the second 
Chern number directly from the measured double-well imbalance 7,(p,), the 
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average change in the imbalance per cycle for the entire cloud 6T. Cow) is obtained 
from a linear fit of the differential imbalance Z,((p,) — Z,(— (gx) for each value 


of yp. The influence of the excitations can be reduced by restricting the fitting 


region to a small number of pump cycles. The response of an infinite system is 
(0) 
y 

the data points. When taking into account all points with y,/(21) < 3, this gives 


v3? =0.84(17) for the data in Fig. 3. Note that the linear interpolation for the 
discrete sampling used in Fig. 3c leads to a systematic shift inv? of +0.05. When 
correcting for the finite pumping efficiency along x (see below), which can be 
measured independently without prior knowledge about the system, we obtain 
v3? = 0.94(19). 

Model for double-well imbalance including experimental imperfections. To 
isolate the nonlinear response of the lowest band from the band-mapping data, we 
use a simple model that takes into account band excitations and double occupation 
of plaquettes, and the experimental pumping efficiency of the linear response. The 
average double-well imbalance Z,((p,) can be written as 


reconstructed by averaging SZ yp!) over :y*” using linear interpolation between 


T(Y,) = tysTF(p,) + tec TH (p,) + mTZ"(p,) 


where figs (Mexc) is the fraction of atoms on singly occupied plaquettes in the ground 
(first excited) state along y and ny is the fraction of atoms on doubly occupied 
plaquettes, which we assume to be in the ground state. These quantities can be 
determined experimentally at each point in the pumping sequence. T®*, 7° and 
T* denote the imbalances of the corresponding states, which depend on the local 
phase of the y superlattice at the position of the cloud along x, 9)(xcom). The 
imbalance curves can be calculated theoretically using the respective double-well 
Hamiltonian (equations (5) or (6)) and can be obtained experimentally by study- 
ing the linear pumping response. The COM position in turn depends on the pump 
parameter (, and includes corrections for the finite pumping efficiency 


lpxl/™ 
Xcom(Y,) = sgn(y,) Y> (28o8'— B)ds 
i=1 

for p/n € Z. Here, 39 = 0.980(A) is the initial ground-state occupation along x 
and 3=0.986(2) is the pumping efficiency, given by the fraction of atoms that 
remain in the lowest subband during each half of a pump cycle and are there- 
fore transferred by one lattice site along x. The main contributions that limit 
the pumping efficiency are band excitations in the pumping direction and non- 
adiabatic transitions between neighbouring double wells induced by the external 
harmonic confinement. Although the local slope of the transverse response for 
doubly occupied plaquettes differs from that for single atoms, they exhibit the same 
quantized transport along x and y for the parameters used in the experiment when 
covering the entire 4D pump path. 

Measuring band excitations. Band excitations in the y direction are measured by 
adiabatically ramping the superlattice phase y from its initial value to 
n/2 £0.156(5)n and subsequently increasing ‘the short lattice depth to 
V.y=40(1)E;s. In this lattice configuration, ground-state atoms on singly and 
doubly occupied plaquettes are fully localized on the lower-lying site along y, owing 
to the large double-well tilt A, and the suppression of tunnelling as J,, &Jy — 0. 
On the other hand, atoms in the excited band along y localize on the higher- 
lying site and can be detected directly by measuring the resulting double-well 
imbalance. 

Detecting doubly occupied plaquettes. The doublon fraction can be determined 
by taking advantage of the fact that two atoms in the same double well localize on 
the lower-lying site only at much larger double-well tilts than for a single atom, 
owing to the repulsive on-site interaction. For this, the double wells along y are first 
merged into a single site by removing the short lattice and increasing the long 
lattice to Vi, =30(1)E,, within 5 ms. At the same time, the orthogonal lattice depths 
are ramped up to V,,,=70(2)E,,; and V,= 100(3)E,,, to increase the interaction 
energy. After that, py is shifted adiabatically to either 0.474(5)x or 0.431(5)n and 
the sites are again split into double wells by ramping up the short lattice to 
Viy=40(1)Eys. At yp = 0.4311; single atoms and doublons are both fully localized 


on the lower-lying site. On the other hand, at p© = 0.4747 single atoms are still 
very well localized on the lower site, but two atoms in the same double well localize 
on different sites owing to the large interaction energy U> A,. By determining the 
site occupations for both phases, we can therefore infer the doublon fraction from 
the difference in the even—odd imbalance between the two measurements. 

Calculating the double-well imbalance along y. The measurement of the popula- 
tion imbalance in the y direction as a function of y, for Figs 3 and 4 is performed 
after an integer or half-integer number of pump cycles (y,=/n, | € Z). At these 
points, the superlattice along x is in the staggered configuration, with the maximum 
energy offset | A,.| >> J,, and 6J,=0. The atoms are thus fully localized on either even 
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or odd sites along x for y, = 2/n or y,= (21+ 1)n, respectively. The four-site unit 
cell of the 2D superlattice therefore effectively reduces to a double well along y. 

For singly occupied double wells, the expected imbalance in the y direction for 
atoms in the ground (7°) and first excited state (Z"*) can then be calculated from 
the single-particle double-well Hamiltonian 


: Ajl(y)/2  —Ii(y,) 
Aowle)=| 7.” oe (5) 
1g) —Ay(y)/2 


with J° (9) = =J,(y,) + Yy(y,)/2 and using the Fock basis for the atoms on even 
and odd sites, |1, 0° and |0, 1), respectively. 

Correspondingly, the imbalance for the ground state of a doubly occupied dou- 
ble well (Z : ®*) can be determined using the two-particle double-well Hamiltonian 
U+A, —V2J, 0 

-/27, 0 2s} (6) 

0 —-W2J; U-A, 


(2) 
Apw (py) = = 


, 0), |1, 1), |0, 2)}. Here, U denotes the on-site interaction 
energy for two atoms localized on the same lattice site. 

Fit function for nonlinear response. On the basis of the above model, the 
specs data are ee with the function Z,(~x) + Zo with 
Ly i = oe as aly, — () | The two fit parameters are the pre-factor a, 
which destaibes the change in be superlattice phase along y with y, compared to the 
ideal case yp“? = ~ , and an overall offset Zp. The transport properties of the lowest 
band are encoded in the slope of the ground-state imbalance at y,,=0. Knowing a, it 
can be related to the ideal slope via 
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Per cycle, this gives a change in the population imbalance for ground-state atoms of 


gs 
srf =a 
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Determining the second Chern number from the scaling of the nonlinear 
response with 0. The COM displacement per cycle along y for an infinite system, 
dycom = 129a,a,/d\,, scales linearly with the perturbing angle 0. The second Chern 
number can thus be extracted from the slope of 5ycom(6). Having confirmed that 
the measured shape of sry 8) is the same as expected theoretically, the response 
of an infinite system at a givert angle @ can be inferred from a single measurement 
of 6T 5) at a fixed gy. This holds for all angles because the shape of 7 (ey ) is 


independent of 0. To obtain 1, it is therefore sufficient to determine the slope of 
wh 5 (0 ) at a constant yp, 

Nonlinear response versus lattice depth. The technique for detecting the 
nonlinear response with site-resolved band mapping, introduced in the main text, 

allows us to determine the slope over a wide range of lattice parameters accurately. 
To demonstrate this, we measure the slope of the nonlinear response at 


p= 0.500(5)n and @=0.54(3) mrad for various values of the transverse short- 


lattice depth V,, (Extended Data Fig. 1). As expected, the slope increases with 
larger depths as he band gap decreases and the Berry curvature {2” becomes more 
and more localized around y= = (14 1/2)x with 1 € Z. 

ie Vy = 6.25E;s, the fivat and second excited subbands along y touch for 
y= =In, Jeading to a topological transition where the signs of the first and second 
Chern number of the first excited subband change from +1 for V., < 6.25E,s to 
—1 for V; > 6.25E,,s. This corresponds to a transition between the Landau and 
Hofstadter regimes”’. For the lowest band, the two regimes are topologically 
equivalent and the atoms therefore move in the same direction. In both limits, the 


experimentally determined slope matches very well with the one expected in an 
ideal system. This nicely illustrates that the transport properties of the lowest band 
can be extracted correctly in both regimes, even in the presence of atoms in the 
first excited band. 

Alignment of the tilted superlattice. Each optical lattice is created by retrore- 
flecting a laser beam, which is focused onto the atoms by a lens on either side of 
the cloud. For the superlattices, the incoming beams of the short and long lattices 
are overlapped using a dichroic mirror in front of the first lens. To control the tilt 
angle 0 of the long lattice along y, a glass block is placed in the beam path before the 
overlapping. By rotating this glass block, a parallel displacement of the incoming 
beam can be induced, which is then converted into an angle @ relative to the short 
lattice beam at the first lens. The two beams intersect at the focus point of the lens, 
which corresponds to the position of the cloud of atoms. After passing through the 
second lens behind the cloud, both beams are retroreflected by the same mirror. 
The counter-propagating beams travel along the paths of the incoming beams, 
thereby creating the lattice potentials with the same relative angle 0. 
Determining the angle 8. When the long lattice in the y direction is tilted by 
an angle 6 with respect to the short lattice, the phase of the superlattice along 
y depends on the position along x. This leads to a modification of the on-site 
potential, which for small angles can be approximated as a linear gradient along 
the x axis, pointing in opposite directions on even and odd sites in y 


AY(G,) & Ay? (PY) + (1) ms 


The strength of the gradient is 


_ td, OAy 
d dy, per 


for a given superlattice phase y and can therefore be used to determine 0. To do 
this in the experiment, a superfluid is prepared at k=0 in a 2D lattice with 
Vix = 13.0(4)E,s and Vi, = 10.0(3)E,). After increasing Vj, to 70(2)E,) within 
0.2 ms, the lattice sites are split along y by ramping up the art lattice in the 
y direction to V,)=20.0(6)E,,s in 0.4 ms. The superlattice phase vp‘ on ) is set to either 
0.344(5)7 or 0.656(5)7 such that the atoms fully localize on even or ddd sites along y, 
respectively. The resulting Bloch oscillations that are induced by the gradient are 
probed by measuring the momentum distribution of the atoms after a variable hold 
time. The angle @ is then calculated from the average Bloch oscillation period of 
both phases to minimize the influence of additional residual gradients. 

Data availability. The data that support the findings of this study are available 
from the corresponding author on reasonable request. 
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Extended Data Figure 1 | Nonlinear response versus depth of the short 
lattice along y. Slope of the nonlinear response at gy = 0.500(5)x and 
6=0.54(3) mrad as a function of V,,, with all other lattice parameters as in 
Figs 3 and 4. i = Con) + Sy(py)/2 with gy = 7/2 is the maximum 
intra-double-well tunnelling rate along y, which is calculated from the 
corresponding lattice depth. The solid line indicates the theoretically 
expected slope and the error bars show the fit error for the slope. The 
dashed line at V;, = 6.25E,,, marks the point at which a topological 
transition occurs in the first excited subband along y, indicating the 
transition between the Landau regime for V, < 6.25E,. and the 
Hofstadter regime for V.y > 6.25E,.s. 
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Extended Data Figure 2 | Pump cycle of the 2D topological charge in 6J,-A, parameter space, in which a singularity also occurs at the origin 
pump. The 4D tight-binding parameter space (8J;, Ax, Jy, Ay) is (inset). The main plot shows the transformed path for yy € [0.467, 0.547]. 
visualized using the transformation in equation (4).a, Changing the pump _¢, Ina full pump cycle, such a system therefore covers a closed surface 
parameter (yp, leads to a periodic modulation of 6J, and A, along a closed in the 4D parameter space by translating the path shown in b along the 
trajectory, as shown in the inset for a full pump cycle y,=0 — 27. This trajectory from a. d, In the transformed parameter space, the singularities 
pump path (green) encircles the degeneracy point at the origin (grey), at at (8), =0, A,=0) and (8J,=0, Ay =0) correspond to two planes that 
which the gap between the two lowest subbands of the Rice-Mele model touch at the origin. e, Cut around r; = 0 showing both the pump path from 
closes. The surface in the main plot shows the same trace transformed c (red/blue) and the singularities from d (grey). Whereas they intersect in 
according to equation (4) and with ¢, € [0.467, 0.5417]. The spacing of the 3D space (rj, 72, r3), the value of rq is different on both surfaces and the 
the mesh grid illustrating ¢, is 7/10. b, For a given (,, a large system 4D pump path thus fully encloses the degeneracy planes. 


simultaneously samples all values of yy. This corresponds to a closed path 
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Photonic topological boundary pumping as a probe 
of 4D quantum Hall physics 


Oded Zilberberg', Sheng Huang’, Jonathan Guglielmon*, Mohan Wang’, Kevin P. Chen’, Yaacov E. Kraus*t & 


Mikael C. Rechtsman? 


When a two-dimensional (2D) electron gas is placed in a 
perpendicular magnetic field, its in-plane transverse conductance 
becomes quantized; this is known as the quantum Hall effect!. 
It arises from the non-trivial topology of the electronic band 
structure of the system, where an integer topological invariant 
(the first Chern number) leads to quantized Hall conductance. 
It has been shown theoretically that the quantum Hall effect can 
be generalized to four spatial dimensions”, but so far this has 
not been realized experimentally because experimental systems 
are limited to three spatial dimensions. Here we use tunable 2D 
arrays of photonic waveguides to realize a dynamically generated 
four-dimensional (4D) quantum Hall system experimentally. The 
inter-waveguide separation in the array is constructed in such a 
way that the propagation of light through the device samples over 
momenta in two additional synthetic dimensions, thus realizing 
a 2D topological pump**. As a result, the band structure has 4D 
topological invariants (known as second Chern numbers) that 
support a quantized bulk Hall response with 4D symmetry’. In 
a finite-sized system, the 4D topological bulk response is carried 
by localized edge modes that cross the sample when the synthetic 
momenta are modulated. We observe this crossing directly through 
photon pumping of our system from edge to edge and corner to 
corner. These crossings are equivalent to charge pumping across a 
4D system from one three-dimensional hypersurface to the spatially 
opposite one and from one 2D hyperedge to another. Our results 
provide a platform for the study of higher-dimensional topological 
physics. 

Topology manifests naturally in solid-state systems. In insulators, 
electrons fill electronic states below the bandgap of the system. These 
states can be mapped mathematically onto abstract shapes that are char- 
acterized by a topological invariant. The realization that these topo- 
logical invariants manifest as quantized bulk responses, and through 
corresponding topologically protected boundary states, has revolu- 
tionized our understanding of material properties. These phenomena 
have been explored in several fields in systems beyond solid-state mate- 
rials, including photonic®* and ultracold atomic'*"!” systems. 

The introduction of topology into photonics? has opened up many 
avenues of research. Much of this research has focused on the exper- 
imental observation of topologically protected edge states in systems 
such as photonic crystals in the microwave domain'®!, as well as arrays 
of waveguides®*"! and integrated ring resonators at optical frequen- 
cies!*. In these systems, dielectric structures act as lattices for light, 
leading to topological 2D photonic bands. Beyond two dimensions, 
experiments with three-dimensional (3D) lattices have unveiled topo- 
logical features'® such as Weyl points!?°. 

The study of topological phases can be defined and understood 
mathematically beyond three dimensions, with a hallmark example 
being the 4D quantum Hall effect?-*+”. In 2D quantum Hall systems, 


energy bands are characterized by the first Chern number, which 
quantizes the Hall conductance and therefore counts one-dimensional 
(1D) chiral edge states in the system. In 4D systems, energy bands 
are characterized by another topological invariant—the second Chern 
number?~472!-74, Similarly to the 2D case, the 4D invariant manifests 
through an additional quantized bulk response with 4D hypersurface 
phenomena. Until recently, the latter seemed only of theoretical interest 
because its realization requires four spatial dimensions. The flexibility 
of atomic and photonic systems, however, has inspired proposals to 
include synthetic dimensions to realize higher-dimensional topological 
physics?>-8, 

The concept of topological pumps lends itself well to synthetic 
dimensions and higher-dimensional physics. Consider a family of 
1D systems parameterized by a momentum in a synthetic orthogonal 
dimension. This momentum is the pump parameter that maps the 1D 
pump to the 2D quantum Hall system with a first Chern number®*. 
The topological bulk response of the 1D pump matches that of the 2D 
quantum Hall effect: varying the pump parameter generates an 
electromotive force that pushes an integer number of charges per pump 
cycle across the physical dimension®. 1D pumps have recently been 
demonstrated in cold atom!*” and photonic®* experiments. 

A 2D topological pump can be subject to two pump parameters, 
corresponding to a 4D quantum Hall system’. In its simplest form, a 
4D quantum Hall system is the sum of two 2D quantum Hall systems 
in disjoint planes””””®, residing in the direct product space associated 
with the individual models. Correspondingly, a 2D topological pump 
manifests as the sum of two 1D pumps on orthogonal axes’. Here we 
consider ‘off-diagonal’ pumps in which the hopping is modulated as a 
function of the pump parameters®*; that is, we study a 2D tight-binding 
model of particles that hop on a lattice described by the Hamiltonian 
(Fig. 1a) 


H= ss EAD JG, soa tiy + t AG, Je; ghey 40+ h.c. (1) 
xy 


where c, annihilates a particle at site (x, y); t(@) = # + A; cos(2nbji + ¢), 
with i € {x, y}, are modulated hopping amplitudes in the i direction, 
with bare hopping f; and modulation A; amplitudes. The modulation 
frequencies b; are mapped in four dimensions to two magnetic fields 
in the x-v and y-w planes’. The pump parameters ¢, and @, correspond 
to momenta in the v and w directions, respectively; that is, their mod- 
ulation dynamically generates electric-field perturbations in these 
directions. Considering that the pump parameters correspond to addi- 
tional synthetic dimensions, we characterize bandgaps of the 2D pump 
with non-trivial second Chern numbers that manifest as a quantized 
bulk response with 4D symmetry’. 

We realize such a 2D topological pump using arrays of coupled wave- 
guides (Fig. 1b). Each array is constructed to emulate the 2D pump 
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Figure 1 | 2D topological pump and its band structure. a, Schematic of 
the lattice model (equation (1)) with a 3 x 3 unit cell, that is, b, = 1/3 and 
by = 1/3, resulting in three different hopping amplitudes (solid, dashed and 
dotted lines) in each direction, which can be modulated using the pump 
parameters ¢, and @,. b, Illustration of the 2D (7 x 13) array of waveguides 
with z-dependent spacing. Light is injected into the input facet, is pumped 
across the array during its propagation (owing to the topological nature of 
the 2D pump) and is collected on the other side using an InGaAs CCD 
camera. c, Calculated band structure for a similar device, consisting of a 
70 x 70 array of coupled waveguides, where energy E is plotted along the 
path ¢,= @, (larger dimensions chosen for clarity) at a wavelength of 
1,550 nm, normalized by the bare hopping amplitude f. Bulk modes are 
shown in grey, edge modes in red and orange, and corner modes in black. 
The insets show representative wavefunctions for each type of mode. For 
our choice of pump parameters, the edge modes (red and orange) form 
wedges owing to their degeneracy. The corner modes vanish into the bulk 
bands along their pump path and weakly hybridize with bulk modes. We 
perform pumping experiments to study the properties of these boundary 
states, in which ¢, and ¢, are scanned between 0.4777 and 2.197 (vertical 
dashed lines; arrows indicate the pumping direction); see Figs 2 and 3. 


(equation (1) with b, = 1/3 and by = 1/3), using 7 rows and 13 columns. 
The inter-waveguide separation is such that the evanescent coupling 
between nearest-neighbour waveguides is modulated according to 
equation (1), with \, = A,= 1.06 cm7! and f,= Ly =1.94cm7! (ata 
wavelength of 1,550 nm). Nevertheless, the evanescent coupling is a 
function of both separation and wavelength (Methods, Extended Data 
Fig. 1). Therefore, the resulting structure has coupling between wave- 
guides beyond its nearest neighbours and the emulated model does not 
decompose into two disjoint 1D pumps. Despite this, the spectrum for 
the device demonstrates gap-traversing boundary states, with both edge 
and corner states (Fig. 1c, Methods, Extended Data Fig. 2). 

The appearance of such edge phenomena results from the non-trivial 
4D topology of the 2D pump. The 4D symmetry of the second Chern 
number bulk response generates two types of response: density-type 
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and Lorentz-type’””’. The edge bands support the former and the cor- 
ner modes the latter (Methods). For clarity, we explain the appearance 
of the topological boundary modes by studying the structure of the 
model described in equation (1). Because this model can be decom- 
posed into decoupled 1D pumps, each having gaps with non-trivial 
first Chern numbers™®’, we have the following: (a) the spectrum of 
the 2D pump is a Minkowski sum of the spectra of the two 1D pumps, 
E=E, + Ej; (b) the states of the model are product states of the two 
independent models; and (c) the product bands are associated with 
second Chern numbers that are equal to the product of the individ- 
ual first Chern numbers’. The third result leads to non-trivial bulk 
phenomena only when gaps remain open in the summed Minkowski 
spectra. Importantly, the second Chern number and the correspond- 
ing 4D symmetry of its associated bulk responses imply that pumping 
will occur in response to a scan of either or both pump parameters ¢; 
(Methods). 

Let us now consider these properties of the model in equation (1) 
in an open geometry. Because each 1D pump has 1D bulk modes and 
zero-dimensional (0D) boundary modes, (a) and (b) above imply 
that the 2D pump states are grouped into three categories: (i) 2D bulk 
modes composed of products of 1D bulk modes; (ii) edge modes com- 
posed of products of 1D bulk modes with a 0D boundary; and (iii) 
corner modes that are a product of 0D boundaries. The boundary 
modes (cases (ii) and (iii)) support the quantized second Chern 
number response (Methods). The 1D edge states of the 2D system 
are pumped in response to a single pump parameter and map onto 
3D hypersurface states in four dimensions. The 0D corner states are 
pumped in response to one or both pump parameters and map to 2D 
hypersurface states. These states highlight the hypersurface phenomena 
that are associated with the second Chern number. 

Our device does not decompose perfectly into two 1D pumps, owing 
to longer-ranged hopping. Nevertheless, the bulk gaps remain open. As 
a result, the characterization of these gaps by non-trivial second Chern 
numbers implies that the bulk response must remain unchanged. The 
appearance of edge states that traverse the gaps as a function of the 
pump parameters ¢; supports this response in a finite-sized system. 
Here we probe the behaviour of these states experimentally. 

The waveguide array (Fig. 1b) is fabricated using femtosecond-laser 
writing*”?! in such a way that each single-mode waveguide couples 
evanescently to its neighbours. When light is injected into the array, 
it excites eigenmodes according to their spatial overlap with the input 
beam. The diffraction of light through the array is governed by the 
paraxial Schrédinger equation, i04) = H(z)y, in which the time- 
evolution coordinate t¢ in the usual Schrodinger equation is replaced 
by the distance of propagation z; 7) represents the tight-binding wave- 
function and H(z) is the Hamiltonian. Therefore, the diffraction of 
light through the array mimics the time evolution of the wavefunction 
of a quantum particle. Consequently, time-dependent pumping means 
adiabatically varying @; along the waveguide axis®*: 4; > $((z). 

We demonstrate experimentally the appearance of edge modes in 
the structure and their behaviour under scans of the pump parameters. 
We start by studying a structure with straight waveguides, which is 
therefore invariant in z. We inject light into two different waveguides 
in the array: one along the left edge and one along the bottom edge. 
The output light is collected after a diffraction length of 15cm. Light 
stays confined largely to the injected edge (it mostly excites the topo- 
logical localized edge bands; Fig. 2a, b). Additionally, it spreads across 
the whole edge, implying dispersive bands of edge modes (such as the 
bands that cross the gaps in Fig. 1c), in accordance with the expected 
density-type response (Methods). The light stays confined to a single 
edge as a result of the weak coupling between states on adjoining edges; 
that is, the long-range coupling does not break the orientation that is 
associated with the two orthogonal 1D topological pumps embedded 
in the system. Some of the edge states (case (ii) above) that we excite 
have the same energies as bulk states in the open system geometry 
(Methods). These long-lived resonances further demonstrate that the 
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Figure 2 | Edge-to-edge pumping. Images of the output facet of 
waveguide arrays after z= 15 cm of propagation are shown. a, b, Device 
with no pumping, corresponding to a model with @, = ¢, = 0.4777 (see 
Fig. 1). Light that is injected at the centre of the left (a) or bottom (b) edges 
excites the topological edge bands and spreads out along the edge. 

c, d, Pumping of ¢, (from 0.4777 to 2.191 while ¢, is held constant at 
0.4777) causes the light injected at the left edge to be pumped to the right 
(c); no such pumping is observed when light is injected at the bottom 

edge (d). e, f, When ¢, and @y, are simultaneously pumped (from 0.4777 


long-range coupling is a small perturbation of the decoupled model 
in equation (1). 

Having established that we can excite the edge modes of the 2D 
pump, we demonstrate their behaviour under scans of the pump 
parameters ¢;. We implement edge pumping by allowing the positions 
of the waveguides to ‘wiggle’ by varying ¢; as a function of z (Fig. 1b). 
We vary these pump parameters within the range [0.4777, 2.197] 
because localized edge modes exist at these values (a full pumping cycle 
is not necessary to observe edge pumping from one side of the system 
to the other). We fabricate separate arrays that correspond to two sce- 
narios: (1) pumping in only the x direction; and (2) pumping in both 
the x and y directions. In case (1), we see that when light is injected at 
the left edge, it is pumped to the right edge (Fig. 2c); however, when it 
is injected at the bottom, it is not pumped to the top because ¢, is not 
pumped (Fig. 2d). In case (2), we observe that the edge states pump 
both from left to right (Fig. 2e) and bottom to top (Fig. 2f). We injected 
light with several different input wavefunctions along the edge in ques- 
tion (including single and double waveguide inputs), which resulted 
in different amounts of overlap of the input wavefunction with each 
of the edge bands; clear pumping was observed in each case. These 


a b 


Figure 3 | Corner-to-corner pumping. Images and devices are similar to 
those described in Fig. 2. a, With no pumping, so light stays confined to 
the corner. b, Light is pumped from the bottom-left corner to the bottom- 
right corner via @,. c, When @, and @y are both pumped, the corner state 
is pumped from bottom-left to top-right. The corner state passes through 


to 2.197), light injected at the left (e) and bottom (f) edges pumps from 
left to right and bottom to top, respectively. Light in the bulk arises from 
imperfect coupling to edge states and from deviations from adiabaticity. 
The yellow dashed circles indicate the injection sites at the input facet 

(z= 0) and the red arrows indicate the direction of pumping. These results 
demonstrate that edge bands exist in the structure and appear on opposite 
sides of the device as a function of the pump parameters, in accordance 
with the density-type bulk response that is implied by the 4D Hall-type 
band structure of the system. 


results show that an electromotive force applied in the v and w direc- 
tions induces pumping of edge bands from one 3D (1, w, y) hyperplane 
to the opposite one in the x direction, and from one 3D (v, w, x) hyper- 
plane to the opposite one in the y direction, as implied by the 4D Hall 
bulk density-type response (Methods). 

We examine the pumping of states at the corners of the arrays for 
the same range of @, and ¢y as for edge states. The presence of the cor- 
ner modes (black in Fig. 1c) support the Lorentz-type bulk response 
(Methods). Depending on the values of #, and @,, the corner modes 
can either be in the bandgap or overlap with bulk modes where they 
can hybridize to form long-lived resonances. In the experiment, the 
bottom-left-corner mode is directly excited and pumped along the 
bottom edge, in conjunction with it being the boundary mode of the 
1D pump that crosses edge to edge (Fig. 3a, b). Interestingly, when we 
scan ¢, and @y simultaneously, the bottom-left-corner mode is pumped 
mostly to the top-right corner (Fig. 3c) despite any hybridization with 
bulk modes. Such diagonal pumping under a concurrent ¢; scan agrees 
with the 4D symmetry of the second Chern number bulk response, that 
is, with the Lorentz-type transverse response (Methods). The photonic 
diagonal pumping through bulk bands is expected in the decoupled 


the bulk band and remains localized because it is a long-lived resonance, 
not in the bandgap (Methods). Its appearance on the diagonally opposite 
corner is in accordance with the Lorentz-type response that is implied by 
the 4D Hall-type band structure of the system. 
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model (equation (1)), in which each constituent 1D pump is charac- 
terized by its own first Chern number and therefore the corner modes 
manifest as the fully bound joint product of the protected topology at 
the boundary of the 2D pump. This in turn means that in our set-up 
the corner modes only weakly hybridize with the bulk and the pumping 
is carried by long-lived resonances. We note that topological corner 
modes are unique in the sense that they have two fewer dimensions 
than the physical dimension of the system (conventional topological 
modes have one fewer dimension). The appearance and demonstration 
of such modes has recently been reported in inversion-symmetry- 
protected 2D systems*??, 

In conclusion, we have observed topological edge pumping associ- 
ated with the 4D quantum Hall effect in a 2D photonic system using 
synthetic dimensions. These observations imply that the system is char- 
acterized by a non-zero second Chern number. Boundary phenomena 
provide an independent observation of the physics implied by the 
second Chern number of the system, in addition to the measurement 
of the quantized nonlinear bulk response in a similar model using cold 
atoms™. The realization of 4D quantum Hall physics opens up the pos- 
sibility of realizing many new physical effects and of answering several 
open questions, including: whether a bulk measurement of the second 
Chern number can be realized in photonics via the nonlinear response 
to synthetic fields; whether arbitrarily high spatial dimensionality can 
be realized; whether interactions can lead to 4D fractional Hall physics 
when using synthetic dimensions; and whether there are other physical 
quantities that are quantized in four dimensions that can be measured 
directly using synthetic dimensions. Because photonic systems natu- 
rally allow for non-Hermitian Hamiltonians (which arise from gain and 
loss), another question is how non-Hermiticity and topological gaps 
associated with non-zero second Chern number interact. We expect 
that experimental access to 4D quantum Hall physics will open up 
many other directions for research. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 


Experimental specifications. The experiments were conducted using arrays 
of evanescently coupled waveguides fabricated in borosilicate glass using 
femtosecond-laser-writing technology*®!. The waveguides are all identical in 
refractive index and dimension, but the inter-waveguide separation was modu- 
lated to realize the off-diagonal 2D model (equation (1)). In all cases, we observe 
the output image (after 15 cm of propagation) over a range of wavelengths (1,510- 
1,590 nm) in increments of 5 nm, and then average the output intensities over all 
wavelengths (Figs 2, 3). We note that the bandgap remains open over this range. We 
perform the averaging over wavelength to minimize sensitive interference effects 
due to fabrication imperfections. 

Model implementation with waveguide arrays. The diffraction of paraxial light 
through the structures is governed by the paraxial Schrodinger equation»: 


1 kpAn 
(0,4) = — —V7y— 2 
: v 2ko u no 


~ 


where the wavefunction 7)(x, y, z) corresponds to the electric-field envelope, 
E(x, y, 2) = W(x, y, z)exp(ikoz — iwt) Eo, V= a + a, is the transverse Laplacian, 
An(x, y, Z) is the change in refractive index relative to the background index no, 
and ko =2mno/d is the wavenumber in the background medium. For an array of 
single-mode, weakly coupled waveguides, the evolution generated by the paraxial 
Schrédinger equation can be described using tight-binding theory, whereby light 
hops between the bound modes of adjacent waveguides. The hopping amplitude 
t associated with a given waveguide separation can be obtained by numerically 
computing the two lowest eigenvalues E; and E) of the full equation for a system 
consisting of two waveguides; the hopping amplitude is then t=(E, — E))/2. 

To perform this computation for our waveguides, we used a best-fitting 
Gaussian model for the variation in the waveguide refractive index: 
An(x, y) = 6nexp(—x?/o2 -y/o, , with 61 =2.8 x 1073, 0, =3.50 jum and 
oy=5.35 |sm. These parameters were obtained by calibrating over a set of 1D test 
arrays. Using this profile and a background index of no = 1.473, we obtain a model of 
the form t(s) = Aexp(—7s) for the dependence of the hopping amplitudes on the 
waveguide separation s. Here A= A(A) and y= 7(\) are wavelength-dependent 
parameters plotted in Extended Data Fig. 1. We obtain these parameters by comput- 
ing the couplings along the x and y directions separately for different values of s 
(15-35 jum) at wavelengths of 1,450-1,650nm and then fitting the average of the x 
and y couplings to a model of the form given above for the hopping amplitudes. We 
then used this model to solve for the waveguide spacings that are required to imple- 
ment the modulated hopping amplitudes defined by the Hamiltonian in equation (1). 

To provide a clearer picture of the waveguide configurations used in our pho- 

tonic system, we include an illustration of a 1D pump in Extended Data Fig. 1b. 
Varying the waveguide spacings along the propagation direction allows us to 
control the hopping amplitudes in a way that implements a sweep of the pump 
parameter ¢,. To obtain the full 2D array, we consider additional copies of such 
a structure stacked vertically along the y direction, with the vertical spacings 
determined by the hopping amplitudes associated with the y direction. 
The decoupled model. Here we examine how the bulk response in an analogous 
electronic system (that is, one in which states are filled up to a given Fermi level) 
explains the behaviour of the boundary states. The model in equation (1) decom- 
poses along the x and y directions into a sum of two independent off-diagonal 
Harper models, H,(@,) and H,(¢y) (compare with equation (1))% 86 


(by. dy) = HlQ,) + Hy,) (2) 


Each H,(¢@;) is a one-parameter family of 1D Hamiltonians, that is, a 1D topologi- 
cal pump. Treating the parameter ¢; as a Bloch momentum associated with an 
additional spatial dimension 7 € {v, w}, we perform a dimensional extension of 
this model and obtain a model that describes the 4D integer quantum Hall system 
on a lattice with nearest-neighbour hopping in the i direction and next-near- 
est-neighbour hopping in the 7 direction?””*. 

For b= 1/3, the spectrum of the 1D pump (2D quantum Hall) system consists 
of three bands (Extended Data Fig. 2a). Each band 1 has an associated non-zero 
first Chern number (denoted 1 in ref. 34) of 
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which is an integral over the Berry curvature (also known as the Chern density) 
of the filled nth band 
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where we have defined the spectral projector P,, onto all states in the nth band. 
Energy gaps in the 2D Hall effect are also characterized by first Chern numbers. 
The first Chern number ofa spectral gap is the sum of first Chern numbers of the 
bands below that gap in energy. The first Chern number of the bandgap manifests 
through the quantization of the Hall conductance in response to an applied 
in-plane electric field; for example, in our case I, = (e?/h)Ey Un» where I, 
denotes the current density along the x direction, E, is an electric field along the v 
direction, and the sum is over all filled bands. This quantized bulk response has 
corresponding edge states; that is, gapless boundary states appear in a finite sample 
(as many as the sum of the Chern numbers of bands below a given gap) and carry 
the transverse quantized conductance”. 

As discussed in the main text, the eigenstates of the full Hamiltonian (equations 
(1) and (2)) are tensor products of the eigenstates of the two independent Harper 
models |) =|tm) © |Wn), where m enumerates the states in the x-v plane and 
n those in the y—w plane. Their associated energies are Ey = E,, + E,, so that each 
pair of bands from the decoupled models yields a band of the 2D pump (4D quan- 
tum Hall) model. Therefore, in a finite system, because each constituent 1D pump 
has bulk and boundary modes, the tensor product eigenstates can be categorized as 
bulk—-bulk, bulk-boundary and boundary—boundary. A colour-coded illustration 
of the resulting band structure is shown in Extended Data Fig. 2b. 

The resulting Minkowski sum spectrum is not always gapped: depending on 
the amplitudes ¢; and j, the joint spectrum may not be gapped. Consequently, if 
the gaps are closed, then we can no longer discuss the topology of the combined 
spectrum because any small perturbation will mix the states from the different 
bands. When the spectral gaps are open, the bulk-boundary and boundary- 
boundary modes lie for some ¢; at energies within the gaps and for others at ener- 
gies in the bulk bands. Therefore, the boundary—boundary (2D corner) modes that 
overlap with the bulk are generally expected to become finite-lifetime resonances 
upon the introduction of higher-neighbour hoppings that destroy the tensor 
product structure of the eigenstates. Nonetheless, the in-gap bulk—boundary and 
boundary-boundary modes are protected for arbitrary perturbations that do not 
close the gap and are the surface states associated with a non-zero second Chern 
number. 

Second Chern number. Let us consider an energy Fj in the jth gap of the 2D pump 
(4D quantum Hall) system (Extended Data Fig. 2b). The second Chern number 
(denoted 1» in ref. 34) associated with this gap is 
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where Pj(k) is the projector onto the subspace spanned by the eigenstates at Bloch 
momentum k=(,, @), ky» ky) with energies below the gap. The subscripts of k 
mark the vector component. Using the decomposition of H discussed above, V; 
can be written in terms of the first Chern numbers v,, of the Harper models as” 


Y= 


band pairs m,n with Emn< Ej 


xv) yw 
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where v>” and?” are the first Chern numbers associated with the nth band in the 
x-v plane and mth band in the y-w plane, respectively. Combining this result with 
the first Chern numbers shown in Extended Data Fig. 2a, the second Chern num- 
bers associated with the lower and upper gaps of the 2D pump (4D quantum Hall) 
Hamiltonian are V= +1 and —1, respectively. Although the Hamiltonian that 
governs our photonic system does not decompose in the way discussed above, 
owing to the presence of higher-neighbour couplings, the upper and lower gaps 
remain open (see Fig. 1) and, as a result, the associated second Chern numbers 
remain unchanged. 
Bulk responses and their corresponding edge phenomena. Measuring the second 
Chern number via the bulk response directly requires both an external electric and 
magnetic field to be applied. However, the presence of the second Chern number 
implies the presence of surface states, irrespective of the application of external 
fields. In this section, we explain the relationship between the presence of the 
surface states in the model and the second Chern number, from the point of view 
of topological pumping. 

The second Chern number of the jth bandgap has an associated quantized 
nonlinear bulk response 


V; 2 
Tyo 
2 h®y 
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where I, denotes the current density along the a direction, ®p is the flux quantum, 
E; is an electric-field perturbation along the 6 direction, By, is a magnetic-field 
perturbation in the G-y plane, and 4,315 is a Levi-Civita symbol that highlights 
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the 4D non-commutative nature of the response. The second Chern number Y; 
is defined as the sum over all bands up to the jth of a 4D volume integral over a 
generalized 4D Berry curvature of the given band. 

In our spinless case, we can write the 4D Berry curvature in terms of the 2D 

Berry curvatures that exist in the two orthogonal planes associated with the inde- 
pendent models**?”-”°. Let us consider these orthogonal planes to be x-v and y-w. 
In addition, for the choice of boundary conditions in our experiment, let us focus 
on the responses in the direction a =x and study their bulk—edge correspondence. 
The responses in the a=y direction will be similar. Having fixed the response 
direction, there are various choices for the orientation of the perturbing fields in 
four dimensions. These can be split into density-type responses and Lorentz-type 
responses. 
Density-type response. Consider the case where the extrinsic perturbing field By, 
is set in a plane for which there is a non-trivial Berry curvature from the under- 
lying model. For responses in the a = x direction, this occurs when Gy=yw. 
Correspondingly, the orientation of the electric-field perturbation is 6 =v. Owing 
to the non-trivial intrinsic Berry curvature in the x-v plane, E, also generates a 2D 
quantum Hall-like response, and the bulk response is 


1,=1,=1,=0 
L.= 2 aE + © yp B 
x h v h®y et el bd 
where /*” contains the sum over first Chern numbers of filled bands. It is now 
apparent why we denote this response as ‘density-like. The bulk response has a 2D 
quantum Hall-like response, multiplied by a particle-density factor / that results 
from the integration over the 4D volume. The second Chern number response here 
can be understood” to be a Streda formula correction to fi. 

To support such a response in finite-sized systems, the corresponding edge 
phenomena must manifest a band of modes that traverse the gap. The density of 
this edge band is modulated by the magnetic-field perturbation. In addition, from 
the response to E, we conclude that the in-gap band is dispersive with respect to k,. 
Repeating this argument for the density-type response in the y direction, we expect 
an additional in-gap band that is dispersive with respect to ky. 

In 2D topological pumping, we generate the electric field E, using Faraday’s law 
of induction, that is, by modulating ¢,. Correspondingly, the density-type quan- 
tized 4D quantum Hall response implies that within a full 0-27 cycle of @,a band 
of states (corresponding to #) must cross the gap and appear 1” times on either 
side of the x-direction open boundary conditions. The density of this band is 
modulated by the external magnetic-field perturbation and thus accommodates 
the density-type second Chern number response. Following the same arguments, 
similar bands must appear upon scans of @, to support the response in the a=y 
direction. In the photonic experiment, we excite these edge bands directly (as well 
as, inevitably, in-bulk bands) and show that they truly carry modes from one side 
of the sample to the other in both the x and y directions. 

From the above discussion, it is apparent that the observation of edge-to-edge 
pumping implies that a full band spectrum supports density-type second Chern 
bulk responses, and it suffices to see these responses as a function of scans of @, 


and @,. In terms of edge physics, adding a perturbing B,,, field is not illustrative: 
the intrinsic field has already set up the conditions (via the density response) for a 
current that arises from both the first and second Chern numbers. 

Lorentz-type responses. Consider the case where the extrinsic perturbing field 
Bg, is set in a plane for which there is no Berry curvature from the underlying 
model. For responses in the a = x direction, this occurs when (37 € {vy, vw}. 
Correspondingly, the orientation of the electric-field perturbation is 6 € {w, y}. 
We are interested in 2D topological pumping, that is, in generating the electric 
field using Faraday’s law of induction; consequently, we do not apply the electric- 
field perturbation in the y direction. Because we cannot apply a B,,, perturbation 
between the two dynamical axes of the pump, we are left with the response 


2 
L= Yigg, cmb Ew 
Because E,, is generated by pumping ¢,, this response means that V; charge- 
carrier modes must appear within the gap every 1/B,y cycles on each side of the 
x axis. 
In the 2D model, the Lorentz-type magnetic-field perturbation enters (in the 
correct gauge) as a spatial modulation of the model, by changing the modulated 


hopping: 
t.(@,) — fe + Axcos(2Tb,x + 2T7Byy + o,) 
t(d,) — fy + Aycos(2nbyy + dy) 


In the y-w plane, a first Chern number bulk response occurs as a function of scans 
of @,, leading to a gradual change in the coordinate y. Therefore, owing to the mag- 
netic-field perturbation By, as @y is scanned a slow modulation of the potential in 
the x direction also occurs. 

This is a slow modulation that would mean that 1/B,, cycles of ¢, generate in the 
same time a full scan of ¢, from 0 to 27 (see also the bulk-pumping experiment in 
cold atoms‘). Ina finite-sized system, this Lorentz-type bulk response implies that 
boundary modes must appear and cross the gap in response to a joint modulation 
of both pump parameters ¢, and ¢y;: this is precisely the corner mode shown in 
black in Fig. 1c and Extended Data Fig. 2b, and for these gaps |V;| = 1. 

Data availability. The data that support the findings of this study are available 
from the corresponding authors on reasonable request. 
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Extended Data Figure 1 | Waveguide coupling parameters and illustration of the waveguide spacing used to implement our topological 
illustration of a 1D pump. a, The overall scale A (dashed red line) and pump. To simplify the diagram, we show a 1D waveguide array, which 
exponential decay prefactor y (solid orange line) that describe the inter- corresponds to an implementation of a 1D pump. This configuration 
waveguide coupling as a function of their separation s: t(s) = Aexp(—7s). can be thought of as resulting from a constant y slice through the full 2D 


The parameters were obtained using a thorough calibration procedure (see waveguide array. 
Methods) and are plotted as a function of wavelength. b, An additional 
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Extended Data Figure 2 | Nearest-neighbour band structure obtained is obtained by summing a pair of bands from a. The resulting bands can 
from two decoupled models. See equation (2). a, Finite-sample band be classified by the types of state that appear in the sum: bulk—bulk (2D 
structure (energy E versus pump parameter) for a single Harper model bulk), bulk-boundary (2D edge) or boundary—boundary (2D corner). 
aligned along the x direction. Boundary modes highlighted in orange These types are respectively coloured grey, red or orange, and black. As a 
(red) are localized on the left (right) end of the 1D sample. The first Chern _ function of ¢;, the edge modes form ‘dispersive’ bands that thread through 
number associated with each bulk band is also shown. b, Band structure the 2D bulk gaps. The corner modes thread between the edge bands and 


for the fully separable 2D pump taken along the path ¢,= ¢, for a system are therefore forced to cross 2D bulk bands along their ¢; trajectory. 
that decomposes into two independent Harper models. Each band in b 
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One-pot growth of two-dimensional lateral 
heterostructures via sequential edge-epitaxy 


Prasana K. Sahoo!, Shahriar Memaran2, Yan Xin2, Luis Balicas?? & Humberto R. Gutiérrez! 


Two-dimensional heterojunctions of transition-metal 
dichalcogenides'~'* have great potential for application in low- 
power, high-performance and flexible electro-optical devices, such as 
tunnelling transistors”, light-emitting diodes”, photodetectors™* 
and photovoltaic cells”®. Although complex heterostructures have 
been fabricated via the van der Waals stacking of different two- 
dimensional materials*~*", the in situ fabrication of high-quality 
lateral heterostructures” !*'> with multiple junctions remains a 
challenge. Transition-metal-dichalcogenide lateral heterostructures 
have been synthesized via single-step”! !”, two-step!” or multi- 
step growth processes!*. However, these methods lack the flexibility 
to control, in situ, the growth of individual domains. In situ 
synthesis of multi-junction lateral heterostructures does not require 
multiple exchanges of sources or reactors, a limitation in previous 
approaches® '>!° as it exposes the edges to ambient contamination, 
compromises the homogeneity of domain size in periodic structures, 
and results in long processing times. Here we report a one-pot 
synthetic approach, using a single heterogeneous solid source, for 
the continuous fabrication of lateral multi-junction heterostructures 
consisting of monolayers of transition-metal dichalcogenides. 
The sequential formation of heterojunctions is achieved solely by 
changing the composition of the reactive gas environment in the 
presence of water vapour. This enables selective control of the water- 
induced oxidation” and volatilization’’ of each transition-metal 
precursor, as well as its nucleation on the substrate, leading to 
sequential edge-epitaxy of distinct transition-metal dichalcogenides. 
Photoluminescence maps confirm the sequential spatial modulation 
of the bandgap, and atomic-resolution images reveal defect- 
free lateral connectivity between the different transition-metal- 
dichalcogenide domains within a single crystal structure. Electrical 
transport measurements revealed diode-like responses across the 
junctions. Our new approach offers greater flexibility and control 
than previous methods for continuous growth of transition-metal- 
dichalcogenide-based multi-junction lateral heterostructures. These 
findings could be extended to other families of two-dimensional 
materials, and establish a foundation for the development of 
complex and atomically thin in-plane superlattices, devices and 
integrated circuits!®. 

Chemical vapour deposition can produce high quality transition- 
metal dichalcogenide (TMD) monolayers and heterostructures” !°. 
The one-pot synthesis strategy involves using a single solid source, 
composed of MoX, and WX, powders placed within the same boat at 
high temperatures. Implementing this strategy for the fabrication of 
TMD-based heterostructures requires regulating the relative amounts 
of precursors in the gaseous phase through controlled vaporization 
from the solid sources, and/or promoting the selective deposition of 
individual compounds onto the substrate held at lower temperatures. 
In general, MXj compounds (where M= W, Mo and X=, Se) have 
high dissociation temperatures. However, the presence of water vapour 
at high temperatures promotes the formation of highly volatile species, 
including metal oxides and hydroxides'*'”'>”°. Using a one-pot 


strategy (Extended Data Fig. 1), we found that the selective growth of 
each TMD can be controlled independently, solely by switching the 
carrier gas (Extended Data Figs 2, 3): N2 + HzO) promotes the growth 
of MoX, whereas switching to Ar + H, (5%) stops the growth of MoX, 
and promotes the growth of WX). When the carrier gas is cyclically 
switched back and forth, heterostructures consisting of a sequence of 
multi-junctions can be synthesized continuously (Fig. 1 and Extended 
Data Fig. 4). The growth mechanism can be summarized as follows 
(see Methods and Extended Data Fig. 5 for a detailed discussion): 
N2 + H2O0() (without H2) favours the evaporation of both the molybde- 
num and the tungsten precursors (oxides and hydroxides), but because 
gaseous tungsten precursors are mainly hydroxides—which are volatile 
at temperatures above 500°C (ref. 17)—only molybdenum precursors 
are deposited on the substrate. A sudden switch of the carrier gas to 
Ar + H) depletes the supply of molybdenum precursors, while sup- 
plying tungsten precursors owing to the slower reduction rate of WO,. 
This vapour-phase modulation of the oxide species is the key driving 
force for the sequential growth of lateral heterojunctions. 

Figure la—d shows optical images ofa series of distinct multi-junction 
heterostructures, with alternating MoSe, (dark contrast) and WSe 
(bright contrast) regions. The number of junctions is controlled by the 
number of gas-switching cycles, and the lateral size of each domain 
(width) is determined by the growth time of each individual cycle 
(Fig. 1 b, c and Extended Data Fig. 2a-h). The growth rate of MoSe2 
and WSe, domains was found to be a function of the substrate tempera- 
ture (Extended Data Figs Ic, 2i). The single-crystalline heterostructure 
islands, up to 285 1m in size (Fig. 1a), are among the longest reported so 
far'°. Spatially resolved Raman and micro-photoluminescence spectros- 
copies confirmed the sequential distribution of the chemical composi- 
tion as well as the local optical properties within the heterostructures. 
Raman spectra (Fig. le) collected from regions 1 and 3 (Fig. 1a, inset) 
in the heterostructure exhibit the A;, phonon mode (240 cm!) and 
the Ep,” (M) shear mode (shoulder at 249 cm~'), corresponding to 
monolayer MoSe;, whereas regions 2 and 4 display the Ay, (250 cm7!) 
and the 2LA(M) (260cm~') phonon modes of monolayer WSe2 
(refs 12, 21). Raman intensity maps at 240cm~! and 250cm further 
corroborated the spatial distribution of the MoSe, and WSe, domains, 
respectively (Extended Data Fig. 3c-f). The photoluminescence 
spectra (Fig. 1f) show a strong excitonic single peak at around 1.52eV 
for MoSe, (regions 1 and 3) and 1.6 eV for WSe (regions 2 and 4)”. 
The integrated photoluminescence intensity maps (Fig. 1g) and the 
corresponding composite map (Fig. 1h) of the heterostructures reveal 
the alternate formation of concentric triangular domains of MoSe2 and 
WSe, monolayers (Extended Data Fig. 4a, c-e). The contour plots of the 
normalized photoluminescence intensity as a function of the position 
across three-junction (Fig. 1i) and five-junction (Fig. 1j) heterostruc- 
tures clearly show the evolution of the distinct excitonic transitions 
within each domain. Across the first junction (marked 1), the MoSe, 
photoluminescence peak at 1.53 eV gradually shifts to higher energies 
until it reaches 1.60 eV, corresponding to the WSe, domain—a total 
shift of 70 meV. At the second and third junctions there is an abrupt 
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change in the position of the photoluminescence peak, suggesting the 
formation of sharper interfaces with less alloying. At these sharp inter- 
faces, the photoluminescence spectra are characterized by an overlap 
of both peaks; this is due to the submicrometre laser spot size in the 
confocal microscope simultaneously probing both sides of the inter- 
face. Although junctions 2 and 3 are both sharper than junction 1, 
it is worth noting that junction 3 is not as sharp as junction 2; this 
behaviour has been consistently observed in all samples. It indicates 
that a transition from a MoSe, to a WSe domain results in a less abrupt, 
slightly ‘smoother’ interface between the two materials, whereas the 
transition from a WSe, to a MoSe2 domain produces atomically sharp 
interfaces. This was verified by atomic-resolution Z-contrast imaging 
using high-angle annular dark-field scanning transmission electron 
microscopy (HAADF-STEM) (Fig. 1k-o), which provides insight into 
both the crystalline quality and the chemical distribution at hetero- 
junctions at a high spatial resolution. The atoms in monolayer MoSe2 
(Fig. 1k) and WSez (Fig. 11) have a hexagonal arrangement 
(honeycomb-like) with D3, symmetry. The atomic positions of both 
Mo and Se; yield a similar intensity of scattered electrons, whereas the 
W sites display twice that intensity (Fig. 1q)!*. Figure 1k, | shows pure 
MoSe2 and WSe; regions, respectively, within the same heterostructure, 
confirming that the evaporation—deposition process is very selective 
even though both solid precursors (MSe2 and WSe) are present in 
the heterogeneous source. Consistent with the photoluminescence 


64 | NATURE | VOL 553 | 4 JANUARY 2018 


Energy (eV) 


Figure 1 | Multi-junction lateral 
heterostructures and interfaces based on 
MoSe, and WSe,. a, Low-magnification 
optical image of three-junction 
heterostructures. The inset shows a larger 
magnification of the area within the dashed 
box. The dark-contrast regions correspond 
to MoSe,, the bright-contrast regions to 
WSe,. b, c, Optical images of five-junction 
heterostructures. The difference in thickness 
of the MoSe; layers in b and c is seen by the 
difference in thickness of the dark-contrast 
regions. d, Seven-junction heterostructure 
with variable domain widths. The 
underlying colour bars in b-d depict the 
growth timescale: from left to right (pink, 
MoSe;; green, WSez), each division (black 
line) corresponds to approximately 120s. 

e, f, Raman (e) and photoluminescence 
(PL, f) spectra, of a at positions 1, 2, 3 and 4. 
g, h, Photoluminescence intensity maps for 
the WSe2 (1.6 eV, top) and MoSe (1.52 eV, 
bottom) domains (g), and composite 
photoluminescence map (h) for the 
heterostructure in b. i, j, Contour colour 
plots of the normalized photoluminescence 
intensity of three-junction (i) and five- 
junction (j) heterostructures, along the 
arrows in the insets. k, 1, Z-contrast atomic- 
resolution HAADF-STEM images of pure 
MoSe; (k) and WSe; (1). m, n, Atomic- 
resolution HAADF-STEM images of the 
smooth (m) and sharp (n) interfaces, with 
their corresponding Fourier-transform 
patterns and composition profiles (atomic 
fraction of tungsten per vertical atomic 
column). The smooth and the sharp 
interfaces have average interface widths 

of 6nm (21 atomic columns) and 1nm 

(4 atomic columns), respectively. 

0, p, Scattered electron intensity colour plot 
(o) and associated atomic ball model (p) 
for the junction in n. q, Electron intensity 
profile along the white box in i. Scale bars, 
10,.m (a (inset), b-d, g, h). 
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observations, two types of interface were identified: MoSe,—WSe 
interfaces (Fig. 1m), which display a smooth, less abrupt chemical 
transition with some degree of alloy formation, and WSe.— MoSe 
interfaces, which are atomically sharp (Fig. 1n). 

The different interfaces are a consequence of the different oxidation 
and reduction rates of molybdenum and tungsten compounds” as well 
as the gas switching mechanism. When the carrier gas switches from 
H,0 to Hz, the residual metal oxide content depletes rapidly. Because the 
complete replacement of H2O to H; is not possible in the present experi- 
mental setup, this results in a small amount of co-deposition of Mo in 
the WX, domain, hence forming a smooth interface (MoX,;—WX)). 
Under H) flow, while the WX, domain continues to grow, the molyb- 
denum oxide(s) are converted completely to metallic molybdenum 
over the MoX; source. When the conditions are reversed—switching 
from H, to H,O vapour again—the low-index W sub-oxides begin 
to form high-index W sub-oxides, as indicated by the slow weight- 
loss rate of the tungsten oxide precursor in H2O (Extended Data 
Fig. 5c, d). Meanwhile, H2O restores the initial oxidation step from 
metallic molybdenum—formed over MoX; during interaction with Hy 
gas—to MoO,, which is relatively slower than the direct oxidation of the 
MoX: source. This might lead to a delayed supply of MoO; vapour to 
the already present WX, edge-site, and hence result in a sharp transition 
from the WX, to the MoX, domain. Further optimization of the gas 
switching process could lead to the generation of sharp interfaces only. 
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Figure 2 | Multi-junction lateral 
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heterostructures based on MoS, and WS>. 
a, Optical image of a heterostructure composed 
of three MoS,-WS junctions. b, SEM 
image of the region of the heterostructure 
within the dashed box in a. c, d, Raman 

(c) and photoluminescence (d) spectra at 
points 1, 2, 3, 4, and at the junctions 1-2, 
2-3 and 3-4 indicated in b. e, Normalized 
photoluminescence colour contour plot 
along a direction perpendicular to the 
interfaces, where the white arrow indicates 
the growth direction. f, Composite Raman 
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We further extended the use of the one-pot approach to produce 
sequential lateral heterostructures of sulfide monolayers (MoS 2- 
WS,) (Extended Data Figs 6, 7). Figure 2a shows the optical image 
of a three-junction heterostructure (MoS,-WS,-MoS,-WS,). Its 
corresponding scanning electron microscopy (SEM) image (Fig. 2b) 
reveals the coexistence of alternating MoS; (dark contrast) and WS 
(bright contrast) domains. The Raman spectra acquired at different 
positions (Fig. 2c), as well as the Raman maps (Fig. 2f), also confirm 
the sequential formation of MoS, and WS, domains. Regions 1 and 3 
exhibit phonon modes at 384cm7! (Eng) and 405 cm! (Aig) that are 
consistent with monolayer MoS, (ref. 23), whereas the WS, regions 
(2 and 4) present the characteristic first-order (Elo at 355cm7! and Aig 
at 418cm7!) and second-order (most intense at 350cm7! (2LA(M)) 
Raman peaks”, At the interfaces (1-2, 2-3 and 3-4), the Raman spectra 
are mostly composed of a superposition of the vibrational modes of 
both MoS, and WS, domains (Supplementary Table 1). Single photo- 
luminescence peaks associated with direct excitonic emissions from 
monolayers were observed for MoS; (1.84eV) and WS; (1.97 eV) 
domains (Fig. 2d). The corresponding photoluminescence intensity 
maps at 1.84eV (Fig. 2g), 1.97 eV (Fig. 2h) and composite image (Fig. 2i) 
show that, within each domain, the photoluminescence emission is 
homogeneous. The photoluminescence line scan across the junctions 
(Fig. 2e) also displays the modulation of the optical bandgap along 
the heterostructure with sharp discontinuities at the junctions. At the 
interfaces, the photoluminescence spectra show the superposition of 
two well-resolved peaks corresponding to the simultaneous excita- 
tion of MoS, and WS, domains. For the MoS,—>WS, interfaces 1-2 
and 3-4, and around the MoS, domain, these photoluminescence 
peak positions are slightly blue-shifted by 25 meV and by 10 meV, 
respectively. Photoluminescence shifts were not observed at the WS; — 
MoS, interface (2-3), which is consistent with the results obtained for 
selenide-based junctions. Z-contrast images from the inner regions of 
each domain (Fig. 21, m) confirm the high purity of the individual MoS, 
and WS? domains. The high quality and single-crystalline nature of 
these interfaces produced by lateral epitaxy was also verified by electron 
diffraction (Fig. 2)) and Z-contrast STEM imaging (Fig. 2k-o). 
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intensity map of the heterostructure. 

g, Photoluminescence intensity map 

of the MoS, domains, at 1.84 eV. 

h, Photoluminescence intensity map of 

the WS, domains, at 1.97 eV. i, Composite 
photoluminescence map of the heterostructure. 
j, Electron diffraction pattern of the 
heterostructure. k-m, Atomic-resolution 
HAADF-STEM images of a MoS,-WS) 
interface (k), pure WS) (1) and pure MoS, (m) 
regions. n, Atomic ball model superimposed 
on the HAADF-STEM image of the interface. 
o, Electron intensity profile of the region along 
white box in k. Scale bars, 101m (a, b, f-i); 
lnm (k-m). 
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The morphology of the sulfide-based heterostructures involves MoS, 
cores with a truncated triangular geometry containing two types of 
zig-zag edge: metal-terminated (Mo-zz) and chalcogen-terminated 
(S-zz). The WS, domains grow preferentially along one of these MoS, 
edges, leading to WS; sections with a convex isosceles trapezoid shape. 
The consecutive MoS, growth follows the same pattern. The shape 
of the two-dimensional TMD crystal is determined by the relative 
growth rates of the different edges. Experimental”® and theoretical”® 
studies have shown that edge stability in MX; TMDs depends on the 
gas environment, the M:X atomic ratio and the growth temperature. 
A chalcogen-deficient environment promotes the formation of M-zz 
edges, while chalcogen-rich environments favour the stability of the 
X-zz edges. The distinct geometries observed in selenide- and sulfide- 
based heterostructures could originate from the different vaporization 
rates of selenium and sulfur, as well as the stability of the supply of the 
chalcogen atom during growth. This hypothesis is discussed in more 
detail in Methods. 

TMD ternary alloys have received increasing attention owing to 
their composition dependent electronic properties and their potential 
to further expand the range of available two-dimensional materials 
beyond the four primary binary compounds (MoS», MoSe2, WS, and 
WSez) (refs 27, 28). However, integrating different ternary alloys into 
a single crystal heterostructure has not yet been achieved. The versa- 
tility of the one-pot approach allowed us to fabricate sequential multi- 
junction heterostructures based on ternary alloys (MoSj(1_,)Se2.- 
WSa(1—x)S€2x). To this effect, solid sources containing combinations of 
either MoSe2 + WS, or MoS, + WSez were used. Figure 3a, b shows 
optical images of two distinct alloy-based lateral heterostructures 
(ALH1 and ALH2) with three junctions (Extended Data Figs 8-10). 
The corresponding photoluminescence maps (Fig. 3c, d) are consistent 
with different S:Se ratios. MoS2(1_)Se2x and WS9(1_,)Se2, domains in 
ALH1 exhibit single photoluminescence peaks at 1.61 eV and 1.71 eV, 
respectively, whereas for ALH2 the Mo-rich and the W-rich domains 
have photoluminescence emissions at 1.67 eV and 1.8 eV, respectively. 
The photoluminescence line scan across ALH2 (Fig. 3e) shows that 
the position of the photoluminescence peak for each domain remains 
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a b 
c d 
Figure 3 | Synthesis of three-junction lateral heterostructures based 
on MoX2-WX2. X2 = S2(1—n)S€2n. a, b, Optical images of three-junction 
heterostructures composed of MoSo.64Se1.36-WSo 6g Se1.32 (ALH1, a) and 
MOS, 94S€o.96—WS1.0gS€o.92 (ALH2, b). c, d, Corresponding composite 
photoluminescence maps of ALH1 at 1.61 eV and 1.71 eV (c) and ALH2 


at 1.6eV and 1.8 eV (d). e, Normalized photoluminescence colour 
contour plot for ALH2 along a direction perpendicular to the interfaces; 


Position (um) 


constant, with sharp discontinuities at the interfaces. TEM analysis 
confirms that the individual domains are ternary alloys of 
MoS9(q~x)Se2 or WS2(1 Sex. Figure 3f shows a Z-contrast TEM image 
from a WS2(1_x)S€2x domain. The differences in scattered electron 
intensities (Fig. 3g) associated with the metal sites (tungsten in this 
case) and with three distinct combinations of the chalcogen atoms 
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Aexc = 633 nm. The inset shows a typical SEM image of ALH2; the width of 
the image corresponds to 241m. f, Atomic-resolution HAADF-STEM image 
of a WS2(1_)Se2x domain of ALH2. g, Electron intensity profile along the 
white line indicated in f. h, Magnified image of the region enclosed by 

the box in f, showing the different configurations of chalcogen sites. Scale 
bars, 10 {1m (a-d). 


(S2, Sez or SSe) were used to identify the elemental configurations at 
the different atomic positions within the crystal”® (Fig. 3h). The con- 
centration (x) at each domain was calculated from the measured 
photoluminescence peak positions according to Vegard’s law 
E,(MS2(1~x)S€2x) = (1 — x) Eg(MS2) + xEg(MSe2) — bx(1 — x); where 
M=Mo or W and considering bandgap bowing parameters of b = 0.05 
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Figure 4 | Electrical characterization of the heterostructures. 

a, Micrograph of a MoSe2-WSe; single junction grown by chemical vapour 
deposition, displaying the configuration of titanium and gold contacts 
used for the electrical characterization of the individual WSe2 and MoSe 
domains as well as the electrical transport across their junction. An 
exfoliated crystal of hexagonal boron nitride (h-BN) was transferred onto 
the lower edge of the junction to isolate contacts 1 and 5 from the WSe2 
edge, as these contacts are designed to probe only the MoSe, domain. 
The properties of the WSe2 domain are probed through contacts 2 and 

3 or 3 and 4. b, Typical drain to source current Ig; as a function of the 
gate voltage Vpg for the WSe2 (green) and the MoSe? (orange) domains. 
The WSe domain displays current mainly at negative gate voltages—that 
is, hole-doped-like transport—whereas the MoSe domain displays an 
electron-doped-like response. The inset plots Iy; as a function of the bias 
voltage Vq., showing a nearly linear dependence on Vas when Vp, = 0 V. 
This indicates thermionic emission of charge carriers across the Schottky 
barriers located at the electrical contacts. c, Ig, as a function of Vg, across 
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the MoSe2-WSe, interface, displaying a typical diode-like response which 
becomes more prominent under illumination (Vjg=0 V). The inset shows 
a sketch of the MoSe,-WSe, domains, their interface, and respective 

band alignments. d, Photoinduced current Iph = Jas — darks Where Igs is 

the current observed under illumination and Iga; is the current observed 
under dark conditions, as a function of the illumination power P. The 

red line is a linear fit, indicating a linear dependence of J, on P at high 
bias voltages. e, Ij, as a function of Vpg for a WS; (black) and a MoS, 
(brown) domain. Whereas WS; behaves as a hole-doped compound, MoS» 
displays ambipolar behaviour, albeit with a more pronounced electron-like 
response. The inset shows a micrograph of the MoS,-WS) single junction 
device showing the configuration of contacts used to evaluate individual 
domains and their interface. f, Ig, as a function of Vg, across the MoS;-WS, 
interface, showing the characteristic diode-like response. The inset shows 
a sketch of the MoS;-WS, domains, their interface, and respective band 
alignments. 
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and b =0.04 for the Mo-based and W-based alloys, respectively”’. The 
calculated compositions for ALH1 are MoSo 64Se1.36 (x = 0.68) and 
WSo 685€1,32 (x = 0.66). Similarly, compositions of MoS, o4Seo.96 
(x=0.48) and WS, ogSeo.92 (x= 0.46) were obtained for ALH2. Notably, 
a complete miscibility of S and Se was achieved for each individual 
MoSe2(1—x)S2x and WSa(1_.)Se2x domain, as the photoluminescence 
peak positions are constant within the domains. This is the first 
demonstration, to our knowledge, of the controlled synthesis of an 
alloy-based lateral heterostructure composed of multiple junctions. 

Figure 4 displays a detailed electrical characterization of single junc- 
tions composed of MoSe2 and WSe, domains, as well as MoS, and 
WS, domains, grown by chemical vapour deposition. For the different 
samples, we used distinct configurations of contacts allowing us to char- 
acterize the individual domains as well as the electrical transport across 
their interface (Fig. 4a). We find that the WSe and WS, domains show 
a hole-doped-like response when contacted with gold on titanium, 
which is attributable to the Fermi level pinning close to their valence 
bands*” (Fig. 4b, e). By contrast, the MoSe, and MoS domains display a 
pronounced, electron-doped-like response given that gold on titanium 
is expected to pin the Fermi level closer to their conduction bands. 
Equally important is the fact that the current-voltage characteristics of, 
for example, the individual MoSe, and WSe? domains, display a nearly 
linear response (Fig. 4b, inset). This indicates that thermionic emission 
processes promote passage of the charge carriers across the misaligned 
bands of the semiconducting channel relative to those of the metallic 
contacts, or the Schottky barriers. That is, any nonlinearity observed for 
currents flowing across the MoSe,-WSe junction (Fig. 4c) cannot be 
attributed to these Schottky barriers. In fact, the current-voltage char- 
acteristics across the junction display a typical rectification or diode- 
like response, indicating the formation of a well-defined p-n junction. 
Additionally, as expected for a diode, illumination of the junction area 
leads to pronounced photoinduced currents (Fig. 4c, d). Figure 4e, f 
indicates that the MoS,-WS, junctions show a similar overall response 
when compared to the MoSe2-WSez junctions; that is, a clear diode-like 
response or a well-defined p-n junction, although for this particular 
sample the current-voltage characteristics display a more pronounced 
nonlinearity. All domains show ON/OFF current ratios between 10° 
and 10° with relatively modest threshold gate voltages, that is inferior 
Vbg= 10 V when the Jg; as a function of Vp, is plotted in a logarithmic 
scale. This behaviour is comparable to that of samples fabricated from 
exfoliated single crystals, suggesting similar crystallinity. 

The synthetic method developed here follows a different approach 
from previous methods, and is versatile and scalable. The continuous 
assembly of planar multi-junctions by a controlled sequential edge- 
epitaxy may allow for the realization of periodic one-dimensional 
quantum wells and planar superlattices. The controlled and sequen- 
tial integration of alloy-based two-dimensional materials with tuned 
optical properties is another step forward, which could widen the 
range of possible material combinations for the design of spectral- 
selective two-dimensional heterogeneous materials for optoelectronic 
applications. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 

Synthesis. All in-plane lateral heterostructures were synthesized by water-assisted 
thermal evaporation from solid sources at atmospheric pressure, in a chemical 
vapour deposition system developed in-house. Bulk powders of MoSe, (99.9%, 
Sigma-Aldrich), WSe2 (99.9%, Sigma-Aldrich) MoS (99.9%, Sigma-Aldrich) 
or WS (99.9%, Sigma-Aldrich) were used directly in different combinations for 
the synthesis of, mainly, four types of heterostructures: MoSe.—WSe (150 mg); 
MoS3-WS) (150 mg); MoSeo.96S1.04— WS€0,9281.08 (MoSe2-WS, ( 150 mg)); 
MOSo 64Se1.36- WSe1 3250.63 (MoS2-WSez (150 mg)). For the growth of MoX;-WX2 
(where X=S, Se), powder sources containing MoX2 and WX; ina ratio of 2:1 were 
placed side-by-side within an alumina boat (L x W x H: 70 x 14 x 10mm) in the 
centre of a 1-inch diameter horizontal quartz tube furnace. Si substrates, with 
a 300 nm SiO, layer, were pre-cleaned with acetone, isopropanol and deionized 
water. During the growth, the substrates were placed downstream at temperatures 
between 810 and 700°C, 6-10 cm away from the solid sources at 1,060°C. Initially, 
the temperature of the furnace was slowly raised to 1,060°C over 50 min with a 
constant flow of N» (200 standard cubic centimetres per minute, s.c.c.m.) and both 
substrates and sources were kept outside the furnace. When the temperature of the 
furnace reached greater than 1,040°C, the solid precursor and the substrates were 
moved to their respective positions, by sliding the quartz tube into the furnace, and 
simultaneously water vapour was introduced in a controlled manner by diverting 
N2 flow through a bubbler (Sigma-Aldrich) containing 2 ml of deionized water 
at room temperature. In order to switch the growth from Mo-rich to W-rich 
compounds, resulting in a lateral heterostructure, the Nz + HO vapour flux was 
rapidly replaced by a mixture of Ar + 5% H) (200s.c.c.m.). Similar growth condi- 
tions were employed for the growth of heterostructures with other compositions. 
Once the desired heterostructure sequence was completed, the synthesis process 
was abruptly terminated by sliding the quartz tube containing both the precursor 
and substrates to a cooler zone, while keeping a 200s.c.c.m. constant flow of 
Ar + H (5%) until it cooled to room temperature. 

Raman and photoluminescence spectroscopy. The Raman and photolumines- 
cence experiments were performed in a confocal microscope-based Raman spec- 
trometer (LabRAM HR Evolution, Horiba Scientific) in backscattering geometry. 
Excitation wavelengths of 532 nm and 632 nm (laser power at the sample, 77 |1W), 
focused with a 100 x objective (numerical aperture 0.9, working distance 0.21 mm). 
During the photoluminescence and Raman mapping the optical path is stationary, 
while moving the sample on a computer controlled motorized XY stage. 
Transmission electron microscopy. HAADF-STEM imaging was carried out on 
an aberration-corrected JEOL JEM-ARM200cF with a cold-field emission gun at 
200kV. The STEM resolution of the microscope is 0.78 A. The HAADF-STEM 
images were collected with the JEOL HAADF detector using the following experi- 
mental conditions: probe size 7c, condenser lens aperture 301m, scan speed 32 1s 
per pixel, and camera length 8 cm, which corresponds to a probe convergence angle 
of 21 mrad and inner collection angle of 76 mrad. 

Device fabrication. To fabricate the electrical contacts to individual layers within 
MoX; and WX; domains, 80 nm of gold were deposited onto an 8 nm thick layer 
of titanium via e-beam evaporation. Contacts were patterned using standard 
e-beam lithography techniques. After gold deposition, and in order to extract 
adsorbates, the samples were annealed under high vacuum for 24h at 120°C. 
In the case of WSe,—-MoSe; heterojunctions, before deposition, approximately 
30 nm of hexagonal boron nitride crystals (Momentive PolarTherm PT110) were 
mechanically exfoliated from larger crystals, and transferred onto the heterostruc- 
ture using a similar technique to that described in ref. 8. 

Electrical characterization. Electrical characterization experiments were per- 
formed using a source meter (Keithley 2612 A). The sourcemeter was controlled 
via Labtracer2, free software available at https://www.tek.com/source-measure- 
units/2400-c-software/labtracer-28-unsupported. For photocurrent measurements, 
a Coherent Sapphire 532-150 CW CDRH and Thorlabs DLS146-101S were used, 
with a continuous wavelength \ of 532 nm. Light was transmitted to the sample 
through a 10-j1m single-mode optical-fibre with a mode field diameter of 10j1m. 
The size of the laser spot was also measured against a fine grid. An Ip value of the 
order of 10°! A yields diode ideality factors ranging from approximately 3.2 to 
4.5, while yielding reasonable values for the shunt resistance Rg, that is between 
approximately 2.5 and 4.5 MQ. We find that good fits are obtained when Ip is 
allowed to decrease to values below the noise floor of the measurements, 
approaching at least 10° 1° A. This uncertainty in the value of Ip has no effect on 
the values of f or Rs. The diode-like electrical responses were fitted using 
the Shockley diode equation in the presence of a series resistor Rg (ref. 8): 
o wof aA exp( ‘oe : ) } Ip; where Vr is the thermal voltage at a tem- 
perature T, Ip is the reverse bias current, fis the diode ideality factor (f= 1 for an 
ideal diode) and Wo{x} is the Lambert function. The results of the electrical 
characterization experiments are shown in Fig. 4 in detail. 


Tas = 


Growth mechanism. A preliminary study was performed to evaluate the interac- 
tion mechanism between water vapour and MoX; as well as WX; bulk powders. 
By allowing the solid precursor to interact with water vapour at 1,060°C for a 
prolonged time (at least 20-30 min), it was found that different Mo (or W) oxide 
phases evolve, which are assumed to be the main driving force behind the selective 
growth of the individual compounds. It can be seen clearly from the Raman spectra 
(Extended Data Fig. 5a) that MoO, is the dominant phase evolving during the oxi- 
dation of MoS, or of MoSe2. The Raman peak position of the partially oxidized clus- 
ter shows the presence of both MoSe, (or MoS2) and MoO, phases, whereas from 
completely oxidized domains, only frequencies at 126, 203, 228, 347, 363, 458, 496, 
570 and 742 cm~! were observed; this agrees well with the Raman spectra of MoO} 
(ref. 31). A previous report also confirmed that the main solid product during 
MoS, oxidation under water vapour at temperatures greater than 1,000 °C is MoO, 
(ref. 32) rather than MoOs, which tends to be a stable phase under various reactive 
gas environments*. Indeed, in our experiments, the overall oxidation reaction 
between MoX; and water at 1,060°C led to the formation of MoQ} (reaction (1)). 
Furthermore, it was found that the weight loss of Mo-oxides is very rapid in the 
presence H,O vapour (Extended Data Fig. 5c). Taking this into account, presum- 
ably, the sublimation of MoO) proceeds very rapidly at a temperature of 1,060°C, 
and subsequently the vapour is transported and saturated on the desired substrate 
at relatively lower temperatures*4. The recondensed MoO? vapour interacts 
with the H2X already present (as a by-product of oxidation) to form MoX; at 
temperatures ranging from 650 to 800°C (reaction (2))**"°. This leads to the forma- 
tion of MoX; domains. Notably, the growth of MoX, can be abruptly terminated by 
changing the carrier gas from wet nitrogen to dry argon with 5% hydrogen, which 
rapidly depletes the source of MoO; vapours owing to its reduction by hydrogen 
to metallic molybdenum at the surface of the source (reaction (3)). Unfortunately, 
the detection of the exact transport phases was not possible because of constrained 
access to the reaction tube of the chemical vapour deposition system under the 
conditions used for the growth of the TMD heterostructures. Therefore, only the 
most important reaction equations were derived: 


MoXas) 4 4H20¢g) 


MoO) 4 MX) 4 XO2g) + 3Ha(9), where X=S or Se (1) 
MoO) + 2X (g) — MoXaig) + 2H20(g), where X = S or Se (2) 


MoO) + 2Hag) + Mow) + 2H20@) (3) 


By contrast, WX; has different oxidation and reduction behaviour (Extended 
Data Fig. 5c, d) under the above conditions, in which different W,O, oxide phases are 
observed in the Raman spectra (Extended Data Fig. 5b) and Supplementary Table 2. 
In the case of WSez, distinct oxide phases evolved upon reaction with different 
reactive gas vapours for more than 20 min at 1,060°C, as shown in Extended 
Data Fig. 5b** 8, The dominant phases observed in the Raman spectra are WO, 
(refs 36, 38, 39) and W29Oszg (ref. 37). This indicates that the oxidation state of W 
is dominated by that within WO, and W29Osg during the oxidation reaction of 
WX, in wet nitrogen carrier gas by a series of reaction steps (reactions (4)-(7)): 


WX2s) + 4H2 O(g) + WOr(s) + H2X(g) + XOrg) + 3Hyxg), where X=S or Se (4) 


18WOx5) + 12H,0¢) — Wy3049(5) + 12Hg(g) (5) 


Wig049(s) + LOH2O0(g) — W200sa(s) + 10H2(g) (6) 
1 
59 1 200s80s) + 2H20¢) = WO3-n(H20) + 10Ha(g); where n = 0-1 (7) 


The formation of any volatile WO3-(H2O) or similar tungstate species in 
N2 + HO vapour cannot be ruled out, however these species mostly condense 
below 500°C“ and hence might not be participating in the growth process. 
Similarly, in the presence of the reducing agent H), the high-index W sub-oxides 
undergo a series of phase transformations to low-index oxide phases (reactions 
(8)-(10))*4: 


W0058(s) + 10H2(g) + WigO4g(s) + 10H20(g) (8) 


W)8048(s) + Hog) > 18WOr(5) + 12H20(s) (9) 


3 1 1 
77 WOm8 > | WOs)neg + 3 Vw (10) 


The different (WO3),g) species can be formed alongside the reduction process, 
and subsequently transported as vapour to the growth substrate (reaction (11)); 
the reaction of WO, is shown as reaction (12): 
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(WOs)n(g) + 2 H2X(g) + WXa(s) + 2 20), where X= S or Se (11) 


or 


WO rg) +2 FX) — WX2(3) + 2H20¢%), where X = S or Se (12) 


Reactions (8)-(10) and (11)-(12) can occur concurrently. The appearance of 
different molybdenum and tungsten oxidation states can also be directly observed 
from the colour changes of the solid precursors after exposure to different gase- 
ous environments: MoO; (brown), W20Oss (blue), a violet colour indicating the 
presence of W29Osg and W gO, phases in a whisker-type morphology” and WO, 
(chocolate brown) (Extended Data Fig. 5e-l). 

There are important differences in the behaviour of molybdenum and 
tungsten-based compounds in the presence of water vapour”. Firstly, the oxide 
products of tungsten are relatively less volatile than those of the corresponding 
molybdenum compounds. In addition, the high-index W sub-oxides (W2Oss) are 
less volatile and less readily oxidized to WO3. This vapour-phase modulation of the 
oxide species is the key driving force for the observed sequential growth of lateral 
heterostructures. Thus, the growth mechanism can be summarized as follows. 
The selective growth of MoX; or WX2 monolayers can be achieved simply by 
controlling the carrier gas environment. N> + H2O vapour (without H2) favours the 
evaporation of both molybdenum and tungsten precursors, but only molybdenum 
precursors are deposited on the substrate. An abrupt switch of the carrier gas to 
Ar + H; quickly depletes the supply of molybdenum precursors, while continuing 
to supply tungsten precursors owing to the slower reduction rate of WO,. A more 
detailed chemical analysis, including the type of gaseous by-products, in conjunc- 
tion with theoretical models is ongoing. 

In order to further understand the role of molybdenum or tungsten oxides during 
the switching of one material domain to the other (such as MoX2 to WX3), and 
the extent of material diffusion across the interface while changing the carrier gas 
from N,+H,0 to Ar+ H; for heterostructure fabrication, the oxidation induced 
evaporation and the rapid reduction behaviour of different solid sources, including 
MoO; and WO;, were evaluated independently at 1,060°C (Extended Data 
Fig. 5c, d). 

Case 1, in the presence of H20. It can be seen from Extended Data Fig. 5c that sub- 
limation of MoO; is almost instantaneous (97% weight loss in 2 min). By contrast, 
the sublimation of WO; is very slow (approximately 2% weight loss in 2 min) and 
is linear. This is further supported by the observation that the weight loss of MoSe2 
is around three times higher than that of WSe, for a 10-min interaction with H20, 
which is otherwise linear. This shows that, in the presence of H2O, the Mo-oxide 
vapours dominate over W-oxide vapours in the reaction zone. It can be concluded 
that, in the presence of water vapour, oxide products of tungsten are relatively less 
volatile than the corresponding molybdenum compounds. In fact, the slower oxi- 
dation of tungsten compounds might aid the formation of tungsten oxide hydrox- 
ide (WO3-xH20) species, which generally condense below 500°C. Hence, an HO 
environment favours the growth of only MoX; domains. 

Case 2, in the presence of Hz reducing gas. MoO3 undergoes rapid phase trans- 
formation to different sub-oxide phases until it is completely reduced to Mo, 
via the steps MoO3 — Mo4O,,; — MoO) — Mo (Supplementary Table 3)*. 
A weight loss of around 75% was observed in 10 min. In a similar time frame, 
however, WO; undergoes a linear transformation to different sub-oxide phases via 
WO3 — Wy03n—1 — WyO3n—2 (W2005g) — WisO4g —* WO2 (Supplementary 
Table 3). A maximum weight loss of 8.5% was observed in 10 min, which is almost 
9 times slower than the reduction process of MoO3. It indicates that, during the 
switching of H2O to Hy) carrier gas, the residual MoO) reduces instantaneously; 
however, the supply of W sub-oxides is maintained. In addition, the leaching of 
W sub-oxides by H2 is more rapid than their rate of reduction to lower W sub- 
oxides, thus contributing to the growth of the WSe domain. 

From the above observation, it can be concluded that H2O vapour favours 
the growth of the MoSe, domain because the population of molybdenum 
oxides dominates the reaction chamber. The rapid reduction of MoO; indicates 
(Extended Data Fig. 5d) that the rate of MoSe, oxidation is equal to the rate of 
MoO; sublimation, meaning that all the MoO, oxide formed during the inter- 
action of H,O with MoSe sublimes instantly. This has been further confirmed 
during the oxidation of MoSe;, in which we do not encounter any signatures of 
higher Mo-oxide phases. On the other hand, HO vapour favours the continuous 
oxidation of the WSe; precursor to higher sub-oxide phases of W, and the typical 
timescale of growth of the MoSe domain does not apply in this case. However, 
any higher W sub-oxides that occur during WSe? oxidation, such as W29Os¢ or 
WOs, can effectively capture an HO molecule and form tungsten oxide hydroxide 
(WO3-H20), which is very volatile and hence can only condense below 500°C. 
The different interfaces during the transition from one material to other are a 
consequence of the different oxidation and reduction rates of molybdenum and 
tungsten-based compounds as well the gas switching mechanism. When the carrier 
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gas switches from HO to H) (as a reducing agent), the residual Mo-oxide content 
depletes suddenly, as observed from the weight-loss plot. Because, in the present 
experimental setup, absolute depletion of HO to Hz is not possible, this resulted 
in a mild co-deposition of Mo into the WX; domain, hence forming a smooth 
interface (MoX;—WX3). Note that, during the continuous growth of the WX) 
domain, the Mo-oxide completely depletes into metallic molybdenum over the 
MoX; source. When the condition is reverted—that is, changing from H to H,O 
vapour—the W sub-oxides proceed towards forming high-index W sub-oxides, 
as indicated by the slow weight-loss of W-oxide precursors in H2O. Meanwhile, 
H,0 restores the initial oxidation step from metallic molybdenum that, during 
the interaction with Hy» gas, formed at the MoX; surface. This forms MoO, rel- 
atively more slowly than the direct oxidation of the MoX; source. This might 
result in a slight delay to the supply of MoO; vapour to the already existing WX2 
edge-site, and hence always results in a sharp transition from the WX; to the MoX) 
domain. 

Assignment of Raman modes of MoSo.64Se1.356—WSe1.32S0.63 lateral hetero- 
structure. The compositional and spatial distribution of (S—Se) alloy in the 
M0891 )S€2x-WS2(1-x)S€2x lateral heterostructures were examined using Raman 
measurements (Fig. 3a, b and Extended Data Figs 8, 10). The normalized Raman 
spectra in Extended Data Fig. 8c indicate that the MoX3- and WX3-related Raman 
branches are well separated, and mostly consist of several intense peaks in the 
range of 100 to 500cm 1. The intense Raman peaks (Extended Data Fig. 8c) 
observed within domains 1 and 3 (Extended Data Fig. 8a) are related to an alloy 
phase of MoS3(1_.)Se2x (refs 28, 44). In general, Ayy and Ez, modes in monolayer 
M0S2(1_-x)Se2, show typical two-mode behaviour and do not imply phase 
segregation. Splitting of the Aig mode has also been observed, which is attributed 
mainly to the mass difference between Se and S as well as their spatial configuration 
around Mo atoms“. Hence, the observed Raman spectra (Extended Data Fig. 8c) 
for the MoS _.)Se2, monolayer domains have two distinct sets of features: MoS»- 
like features (Ezg(s-mo) (370 cm™!) and Aig(s-Mo) modes (400.5 cm7!)), and MoSep- 
like features close to 264cm~!. In detail, the peaks at 219cm~! and 264cm7! 
are observed as a result of Aig mode splitting of the MoSe; phase into low and 
high-frequency domains, respectively, whereas MoS,-like Aj, shifts from 405 to 
400.5cm~!, and Eng shifts from 385 to 370 cm~!, confirm the presence of Se incor- 
poration in the lattice site of S (ref. 45). Similarly, the normalized Raman spectra 
corresponding to domains 2 and 4 (Extended Data Fig. 8a) display several phonon 
modes typical of a WSe2,S2(1_x) alloy, which can be assigned to modes Ajg(se-w) 
(256-259 cm“), Aigis-w) (404-406 cm™!), Aig(s-w-se) (379-381 cm™'), Eng(s-w) 
(354 cm”), Aig(m)-LAg_w) (225 cm!) and Aig(se-w)-LA(se-w) + Ex(s-w)-LAs_w) 
(138-141 cm") (ref. 27). The observed red shift (around 12 cm’) of the Aigs-w) 
mode in a Se-rich environment, as compared to that of isotropic monolayer WS) 
phase and the corresponding hardening of the Ajg(se-w) mode, clearly indicates the 
presence of Se/S alloy in MoSj(,_)Se2, and WS3(1_)Se2x domains. However, the 
position of the Ex9s_w) mode does not change (+1 cm~!), which might be attribu- 
ted to the weak coupling between the very weak Ey¢(se-w) mode and the strong 
Ey¢¢-w) mode”’. This has been further confirmed by Raman intensity mapping 
as shown in the composite image (individual component maps in Extended Data 
Fig. 8h-k). Even though the Ajgis-w) and Ajg(s-mo) peaks differ by only around 
4cm |, the mapping provides clear in-plane differentiation between these two 
domains that matches the optical contrast of the heterostructure. 

Assignment of Raman modes of MoSep,.96S;,04—WSeo0,.9251.08 lateral hetero- 
structure. Extended Data Fig. 10b shows the MoSe2(1_)S2x and WS2(1_x)Se2x 
related Raman spectra at different regions of the heterostructure (including the 
junctions) corresponding to the optical image in Fig. 3b. The prominent peaks, 
observed within domain 1 and 3 (Extended Data Fig. 10a), are mostly related to an 
alloy phase of MoSe(1—x)S2x, which can be assigned to MoS,-like peaks (Ajg(s-mo) 
modes (402.5 cm~'), Eags-mo) (371-374 cm~!)) and MoSe;-like peaks (high 
frequency Ajg(se-mMo) modes (266-267 cm7!), low frequency Aig(se-Mo) modes 
(223 cm~'), and Ep¢(se-mo) (277-278 cm )). Similarly, Raman spectra collected 
from domains 2 and 4 display several modes that correspond to a typical 
WS 2(1-)S€2x alloy, and can be assigned to modes Ajg(s-w) (211-213 cm~!), 
Aig(Se-w) (263 cm), Eng(s-w) (around 356-358cm~'), Ayg(m)-LAg_wy (around 
225. cm”) and Aig(se-w)-LA(se-w) + Engs-w)-LA(s-w) (around 160 cm~!), The 
Ajg(s-w) mode is red shifted by approximately 4 cm~', whereas the corresponding 
large shift of the Ajg(se-w) mode is due to the occurrence of uniform S/Se alloying in 
these heterostructures. This is supported by the distinct photoluminescence spectra 
(Extended Data Fig. 10c, d) collected from the MoSe2(;_.)S2x and WS2(1_»)Se2x 
domains. The individual Raman and photoluminescence maps further confirm 
the seamless connectivity as well as uniformity in the distribution of S/Se alloy 
within the triangular domains (Extended Data Fig. 10). 

Data availability. The datasets generated and/or analysed in the current study are 
available from the corresponding authors upon reasonable request, and are also 
included with the manuscript as Extended Data and Supplementary Information. 
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Extended Data Figure 1 | One-pot synthetic approach for sequential 
edge-epitaxy of TMDs. a, Schematics of the modified chemical vapour 
deposition system that allows the alternate switching of carrier gas 

that generates the selective edge-epitaxial growth for multi-junction 
heterostructure synthesis. Note that water vapour is introduced by passing 
the carrier gas through the bubbler. The carrier gas is selected by a three- 
way valve placed at the entrance of the quartz tube reactor. b, Temperature 
profile of the furnace, a single heterogeneous source containing both 
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precursors is placed in the high-temperature zone, whereas the substrates 
are placed downstream at the lower-temperature zone. c, Growth rates 

for MoSe, and WSe? domains as a function of the substrate temperature. 
The error bars along the y axis denote the mean standard deviation 

(+6), and error bars along the x axis represent the average length of a 
typical growth region denoted in b. d, Atomic ball model, showing the 
material distribution across the heterostructure in cross-section (top) and 
plane view (bottom). 
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Extended Data Figure 2 | Growth of single-junction MoSe,-WSe, lateral image in d, at frequencies 240 cm! (MoSe domain) and 250cm™~! (WSe2 


heterostructures. a—d, Optical images of single-junction MoSe,—-WSe domain). i, Low-magnification optical images of the MoSe.-WSe single- 
monolayer lateral heterostructures with different WSe? lateral growth junction heterostructure shown in b, obtained at different distances from 
times of 80 s (a), 45 s (b), 30 s (c) and 15 s (d). The inset in d shows the the source precursor as mentioned in Extended Data Fig. 1b, c (regions II 
Raman map of the narrow WSep shell, which is difficult to visualize in the to IV). j, k, Photoluminescence spectra of monolayer (1L), bilayer (2L) and 


optical image. e-g, Composite photoluminescence maps corresponding to _trilayer (3L) heterostructures; MoSe; (j) and WSe (k) domains. Scale bars: 
optical images in a-c, respectively, at 1.53 eV (MoSe, domain) and 1.6eV a-h, 10m. 
(WSe2 domain). h, Composite Raman map corresponding to the optical 
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Extended Data Figure 3 | Optical properties of single-junction 
MoSe2-WSe, lateral heterostructure. a, b, Raman spectra of MoSe, and 
WSe, domains from a single-junction MoSe,-WSe monolayer lateral 
heterostructure using 514nm (a) and 613 nm (b) laser excitation. 

c, Raman spectra at the interfaces of the single-junction MoSe,-WSe 
lateral heterostructure. d, e, Composite Raman intensity maps at a 
frequency of 240cm~! (MoSe, domain, d), 250cm~! (WSe, domain, e). 
f, Position map corresponding to the optical image in Extended Data 
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Fig. 2a. g, Photoluminescence spectra of MoSe2, WSe2 domains and 

at the interface of the MoSe,—-WSe, single-junction monolayer lateral 
heterostructure shown in Extended Data Fig. 2a. h-j, Photoluminescence 
intensity maps at 1.53 eV (MoSez domain, h) and 1.6 eV (WSe, domain, i); 
the composite is shown in j. k, 1, Peak width (in eV) map (k) and position 
(in eV) map (1) corresponding to the optical image in Extended Data 

Fig. 2a. Scale bars: d-l, 10 1m. 
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Extended Data Figure 4 | Optical properties of multi-junction 
MoSe,-WSe, lateral heterostructure. a, Typical SEM image of a five- 
junction MoSe;—-WSe; monolayer lateral heterostructure corresponding 
to Fig. 1b. b, Optical microscope image of a large area of a five-junction 
MoSe2-WSe; lateral heterostructure, corresponding to Fig. 1c showing 
the conformal growth of respective MoSez or WSe2 domains. c, d, Raman 
intensity map of five-junction MoSe,—-WSe, lateral heterostructure 


corresponding to Fig. 1b, g, at frequencies 250 cm ~! (c, WSe2 domain) 
240 cm! (d, MoSe domain). e, Composite Raman map image at 
250cm~' and 240cm . f, Optical image of a three-junction MoSe.-WSey 
monolayer lateral heterostructure corresponding to the inset of Fig. li. 

g, Raman peak position mapping between 236-255 cm~!. h, Composite 
photoluminescence intensity mapping at 1.53 eV (MoSe2 domain) and 
1.6eV (WSe, domain). Scale bars: f-h, 10 1m. 
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Extended Data Figure 5 | Effect of water vapour and H; on the solid 
sources (MoX, and WX;,). a, Raman spectral evolution of the MoO, 
oxide phase from both MoSe and MoS; solid sources upon reaction with 
a constant flow of N2 + H2O vapour for more than 20 min at 1,060°C 
(Supplementary Table 2). b, Raman spectral evolution of different oxide 
phases of WX; upon reaction with different reactive gas environment for 
more than 20 min at 1,060 °C as follows. Only Ar + H, (5%) through H,O 
(200 s.c.c.m.); the Raman spectra is composed of WSeo, most likely a 
Se-deficient surface as well as a mixture of complex oxide phases as 
indicated by the broad peak around 800 cm (1); first partially oxidized 
by N2 + H2O (5 min) followed by Ar + H (5%) through HO (200 
s.c.c.m.) for 10 min. The dominant phase observed in the Raman spectra 
is WO,°* °°?” (2); completely oxidized by N2 + H20O flow for 20 min— 
the dominant phase observed in the Raman spectra is W290sg (3). 

c, d, Reduction of different metal oxide (MoO; and WOs3) and selenide 
(MoSe; and WSe;) solid sources as a function of reaction time and carrier 
gases: in N2 + H,O (c) and Ar + H (5%) (d) flow conditions at 1,060°C. 
It can be observed that the weight loss of MoO; (38.5% in 2 min) is very 
rapid compared to that of WO; (1% in 2 min). In contrast, the reduction 


rate of MoSe2 and WSe; solid precursors are almost linear during Hy 
exposure at high temperatures. During oxidation by HO, however, the 
weight loss of MoSe> is two and five times faster than that of WSe and 
WOs respectively. e-h, A direct visualization of the reaction of MoSe, 
can be gained from the change in colour of the source precursor under 
different conditions: bulk powder of MoSe; (e); after reaction in Ar + H2 
(5%) through H2O (200 s.c.c.m.) (f); after reaction in Nj through H,O 
(200 s.c.c.m.); the chocolate brown indicates the MoO) phase (g); the 
shiny surface indicates the presence of metallic molybdenum reduced 
from MoX; along with the MoO, phase (h). i-I, Different oxide phases of 
WX, upon reaction with different reactive gas environment for more than 
20 min at 1,060°C. Bulk powder of WSe; (i); only Ar + H) (5%) through 
H,0 (200 s.c.c.m.) (j, corresponding to spectrum 1 in b); first partially 
oxidized by N, + HO (5 min) followed by Ar + H, (5%) through H,O 
(200 s.c.c.m.) for 10 min (chocolate brown, k, corresponding to spectrum 
2 in b)?**?; completely oxidized by Ny + HO flow for 20 min—the 
dominating phase observed in the Raman spectra is W29Oszg (blue-violet, 
1, corresponding to spectrum 3 in b). The insets of 1 show the high 
magnification image (left) and the materials in an alumina boat (right). 
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Extended Data Figure 6 | Growth of multi-junction MoS,-WS, lateral 
heterostructure. a—d, Optical images of MoS,-WS, monolayer lateral 
heterostructures: single-junction (a), two-junction (b), three-junction (c), 
five-junction (d). e, Typical low-magnification optical image of the five- 
junction structure shown in d. f, SEM images of the three-junction 
MoS)-WS; lateral heterostructure shown in c. g, SEM image of a 
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three-junction single island (Fig. 2b). h, Typical photoluminescence 
spectra from MoS, and WS, domains of the three-junction heterostructure 
shown in g. The strong photoluminescence intensity compared with that 
of the Raman Aj, mode (over 300 times greater) indicates the monolayer 
nature as well as high optical quality of the as-grown heterostructure. 
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Extended Data Figure 7 | Optical properties of MoS,-WS; lateral (d, WS; domain), 405 cm7! (e, MoS2 domain). f, Photoluminescence 
heterostructures. a~c, Composite photoluminescence intensity mapping position mapping corresponding to the optical image in Fig. 2a. g, SEM 
of single-junction (a), two-junction (b) and three-junction (c) MoS2-WS, image of a three-junction MoS)-WS) monolayer lateral heterostructure 
monolayer lateral heterostructures corresponding to the optical images island. The high magnification image of the boxed region, shown in the 
in Extended Data Fig. 6a-c, respectively, at 1.84 eV (MoS, domain) and right panel, shows the lateral connectivity between respective domains of 


1 


1.97 eV (WS, domain). d, Raman intensity mapping at frequency 351 cm™ MoS) or WS). Scale bars: a-g, 10 um. 
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Extended Data Figure 8 | Growth of three-junction MoSo.64Se1.36 - colour plot along the direction perpendicular to the interfaces, as shown 
WSe32S0.6s lateral alloy heterostructure. a, Optical image of a in the optical image in the inset. f, g, Photoluminescence intensity maps 
three-junction MoSa(—.)Se2x—WS(1—y)Se€2y monolayer lateral at 1.61 eV (f, MoSo 64Se1.36 domain) and 1.71 eV (g, WSe) 32S0.6g domain) 
heterostructure. b, The corresponding low magnification optical corresponding to the optical image in Fig. 3a. h-k, Raman intensity maps 
image of the heterostructure shown in a. c, d, Raman (c) and (Fig. 3a) at frequency 400.5cm~! (Aig(s-Mo) modes, h); 264 cm! (Aig(se-Mo) 
photoluminescence (d) spectra corresponding to the optical image modes, i); 404cm7! (Aigs-w) mode, j); and 256 cm! (Aig(se-w) mode, k). 
in a between points 1-4. e, Normalized photoluminescence contour Scale bars: f-k, 10,1m. 
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Extended Data Figure 9 | Growth of three-junction MoSeo 968} ,04- c, d, Typical large-area SEM image (c) and high magnification SEM image 
WSep 9281.03 lateral alloy heterostructure. a, b, Low-magnification (d) of a single island showing the presence of different growth rates along 
optical images of three-junction MoS 9(1_.)S2x-WS2(1_»)Se2y monolayer the vertex and the axial directions. The MoS (;_.)S2, growth along the 
lateral heterostructure (corresponding to the optical image in Fig. 3b). vertex direction is less than that of the axial direction. 
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Extended Data Figure 10 | Optical properties of multi-junction (Aigis-w) modes, e); 402 cm " (Aig(s-Mo) mode, f) and 354cm7! (E2gis-w) 
MoSep 96S1.04- WS€p,92S1.93 lateral heterostructure. a, SEM image ofathree- _ modes, g). h, Raman position mapping between 399-417 cm |. There is 
junction MoSj(1—y$2x-WS2(1-)Se2y monolayer lateral heterostructure. Scale a thin line of MoS2(1_)S2x between the WS2(1_y)Sezy strip along the vertex 


bar, 2,1m. b, c, Raman (b) and photoluminescence (c) spectra of points direction which could not be resolved during the Raman mapping. 

1 to 4; and interfaces. d, Normalized photoluminescence spectra from i, j, Photoluminescence intensity map, corresponding to the optical 

a line scan perpendicular to the three junctions, regions 1 to 4 in a, as image in Fig. 3b, at 1.67 eV (MoSep 9681.04 domain, i) and 1.8 eV (WSepo.9281.08 
indicated in the inset of Fig. 3e (Agxc= 633 nm). e-g, Raman intensity maps domain, j). Scale bars: e-j, 101m. 


corresponding to the optical image in Fig. 3b, at frequencies 412cm! 
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Perovskite nickelates as electric-field sensors in 


salt water 


Zhen Zhang!*, Derek Schwanz'*, Badri Narayanan*}, Michele Kotiuga’, Joseph A. Dura*, Mathew Cherukara?, Hua Zhou’, 
John W. Freeland®, Jiarui Li’, Ronny Sutarto’, Feizhou He’, Chongzhao Wu%, Jiaxin Zhu’, Yifei Sun!, Koushik Ramadoss', 
Stephen S. Nonnenmann®, Nanfang Yu®, Riccardo Comin®, Karin M. Rabe®, Subramanian K. R. S. Sankaranarayanan? & 


Shriram Ramanathan! 


Designing materials to function in harsh environments, such as 
conductive aqueous media, is a problem of broad interest to a range 
of technologies, including energy, ocean monitoring and biological 
applications'*. The main challenge is to retain the stability and 
morphology of the material as it interacts dynamically with the 
surrounding environment. Materials that respond to mild stimuli 
through collective phase transitions and amplify signals could open 
up new avenues for sensing. Here we present the discovery of an 
electric-field-driven, water-mediated reversible phase change in 
a perovskite-structured nickelate, SmNiO3°’. This prototypical 
strongly correlated quantum material is stable in salt water, does 
not corrode, and allows exchange of protons with the surrounding 
water at ambient temperature, with the concurrent modification in 
electrical resistance and optical properties being capable of multi- 
modal readout. Besides operating both as thermistors and pH 
sensors, devices made of this material can detect sub-volt electric 
potentials in salt water. We postulate that such devices could be 
used in oceanic environments for monitoring electrical signals from 
various maritime vessels and sea creatures. 

Pristine SmNiO; (SNO), a quantum material in the family of strongly 
correlated electron systems”®, is a perovskite-structured rare-earth 
nickelate’. The high ionic conductivity that has been noted in SNO 
solid-state fuel cells, comparable to that of the best-performing proton 
conductors, is due in part to their covalent ground state and low-energy 
phonon modes’. Figure 1a illustrates SNO submerged in water in the 
presence of an electric bias generated by a counter electrode. Under 
negative electric potentials, protons intercalate into the SNO lattice, 
accompanied by an uptake of electrons released by oxidation at the 
counter-electrode. As a result, a saltwater-mediated transition from 
pristine SNO to hydrogenated SNO (HSNO) occurs under bias. This 
proton influx accompanies a modification of the electronic configu- 
ration of the Ni 3d orbitals, unlike in electrochromic oxides such as 
WO:, where transition to a metallic state occurs upon cation doping 
(Supplementary Information section 1). The doping-driven resistance 
change for the uptake of one electron per formula unit is about 10,000 
times larger for SNO than for WO3. As Fig. 1b, c shows, the partially 
filled e, orbital with small transport gap for charge carriers in SNO 
becomes half-filled in HSNO, where strong Mott-Hubbard electron- 
electron interaction arises and localizes the charge carriers. 

To demonstrate the response of the perovskite devices when encoun- 
tering an electric bias in salt water, SNO thin films were incorporated 
into a three-terminal electrochemical cell and served as working 
electrodes. An electric potential was applied across the aqueous solution 


using a graphite counter-electrode referenced to a standard Ag/AgCl 
reference electrode. Because water is generally a harsh environment 
for oxides? and the formation of hydroxides can be accompanied by 
massive crystal structure changes (for example, in hydrated cobaltites’’), 
the stability of SNO in aqueous solution was first investigated without 
electric potentials. Figure 1d compares the temperature-dependent 
electrical resistivity of pristine SNO with that of a SNO thin film sub- 
merged for 24h in a 0.6 M NaCl solution at room temperature. Nearly 
identical resistivity-temperature curves are observed in both samples, 
indicating stability. The expected thermally induced insulator—-metal 
transition in submerged SNO at about 130°C is still present, which 
is often used as an indicator of film quality (Extended Data Fig. 1a). 
SNO is robust in both weakly acidic (0.01 M citric acid, pH = 2.7) 
and basic (0.01 M KOH, pH = 12) solutions (Extended Data Fig. 1b) 
over 180 min. This stability of SNO in aqueous environments over a 
range of pH values enables us to study its response to electric bias in a 
systematic manner (Supplementary Information section 2). Moreover, 
the open-circuit potential of SNO relative to the standard Ag/AgCl 
electrode varies with the pH value of the aqueous solutions; this feature 
of SNO, together with its temperature-dependent electrical resistivity, 
enable it to operate as a local environmental sensor (Extended Data 
Fig. 2a, b). 

Figure le shows the electrical resistivity of SNO after applying neg- 
ative electric potential up to —4.0 V (versus the Ag/AgCl electrode) 
in a 0.6 M NaCl solution, the salinity of which is similar to that of sea 
water. Upon application of negative electric potential, the electrical 
resistivity of SNO exhibits an increase by more than five orders of 
magnitude along with a noticeable colour change (Fig. le). The resis- 
tivity of water-treated SNO (Fig. 1d, red curve) decreases smoothly with 
increasing temperature, indicating an insulating state with localized 
electrons. The substantial change of the electrical properties of water- 
treated SNO is non-volatile at ambient conditions (Extended Data 
Fig. 1c), indicating that it is not a simple electrostatic field effect induced 
by charge accumulation at the surface, but a water-mediated phase 
transition. The observed phase change occurs regardless of the aque- 
ous solution (examples include 0.01 M KOH, 0.6 M NaCl and 0.01M 
citric acid) and substrate (for example, LaAlO; (001) and Si (100)) used. 
A cross-sectional conducting atomic force microscopy (conducting 
AFM) image of SNO, which was acquired after the film was used for 
sensing an electric potential in aqueous solution (Fig. 1f, topography), 
shows that no corrosion occurs beneath the SNO-water interface and 
the thin film retains structural integrity. The cross-sectional conducting 
AFM image of the current reveals that the insulating phase (black 


1§chool of Materials Engineering, Purdue University, West Lafayette, Indiana 47907, USA. 2Center for Nanoscale Materials, Argonne National Laboratory, Argonne, Illinois 60439, USA. 3Department 
of Physics and Astronomy, Rutgers University, Piscataway, New Jersey 08854, USA. “NIST Center for Neutron Research, National Institute of Standards and Technology, Gaithersburg, Maryland 
20899, USA. 5X-ray Science Division, Advanced Photon Source, Argonne National Laboratory, Argonne, Illinois 60439, USA. "Department of Physics, Massachusetts Institute of Technology, 
Cambridge, Massachusetts 02139, USA. 7Canadian Light Source, University of Saskatchewan, Saskatoon, Saskatchewan S7N 2V3, Canada. ®Department of Applied Physics and Applied 
Mathematics, Columbia University, New York 10027, USA. Department of Mechanical and Industrial Engineering, University of Massachusetts - Amherst, Amherst, Massachusetts 01003, USA. 
tPresent address: Materials Science Division, Argonne National Laboratory, Lemont, Illinois 60439, USA. 


*These authors contributed equally to this work. 


00 MONTH 2017 | VOL 000 | NATURE |1 


© 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. 


LETTER 


a b d e 
Localized electron é 1085 
10°F ve 
4 5 wey 
14 10°F @ 
_ 404L ia = : ro pone 
€ a it o 6 104b Cam 
i) B's # 
go] gS g ’ 
_-» HSNO, Ni2+ £10 0113 _, Pristine SNO S 1035 is 
2 Paes 2 ve 
= ii + SNO,0.6MNaCI, 24h = # 
© itinerant electron 2 yo2h @ tee ——-2.0 V, 0.6 M NaCl, 5 min g 102b ‘= 
—— id it = 2.0V,0.6MNaCI, 10min © fF 4 
e f y 
° ite 10" i 
Pte eeetcessemsssrrrrtitittttt —- 
re be gal 
SNO, Nis+ 40° 1 \ ! ! ! 1 1 1 A ! 
40 80 120 160 200 0 -1 2 -3 -4 
Temperature (°C) Potential vs Ag/AgCl (V) 
f 9g 
109 F VN 
E pean A 
hi 
108 E Uranoscopus bicinctus elle 
42> (= = 
Ces a 
10° | UUV 
E F g : ae 
Topography ipo Sphyma tibura ee Narcine brasiliensis 
a got Meroenaria ge Astroscopus y-graecum 
FE mercenaria | ee Se — 4] 
; ie 
Current 10-6 fe nQ Yenc Chilomycterus schoepfi 
—> ae 
es E 
500 laa 10°F it |asuv Fea tas a) in [aes Os oe) aaa! aiit ek pee et L 
10-6 105 10-4 10°3 102 1071 10° 10! 


Figure 1 | Saltwater-submersible nickelate sensors. a, Illustration of 

the saltwater-mediated phase transition in SNO. Under bias, the protons 
intercalate and diffuse into the SNO lattice accompanied by electron 
transfer from the counter-electrode (E, electric field). b, c, Schematics of 
the electronic structure of Ni 3d orbitals in hydrogenated (b) and pristine 
(c) SNO. The electrons become localized in HSNO owing to the strong 
Coulomb repulsion in doubly occupied eg orbitals above the tz, orbitals. 

U represents the on-site electron-electron correlation. d, After being 
submerged in a 0.6 M NaCl solution for 24h at room temperature, the 
electrical resistivity of SNO is similar to that of pristine SNO, indicating 
its robustness in water. The red curve shows increased electrical resistivity 
after applying a negative bias of —2.0 V in a 0.6 M NaCl solution for 5 min. 
The sample is then treated under a reverse bias of 2.0 V for 10 min, and its 


colour, HSNO) propagates into the SNO thin film from the water inter- 
face, indicating the intercalation and diffusion of protons into SNO 
during sensing. Additional conducting AFM images taken in-plane 
(Extended Data Fig. 3 and Supplementary Information section 3) 
further rule out corrosion or morphological degradation during water 
treatment as the origin of the resistance change and demonstrate 
the uniformity and diffusional nature of the water-mediated phase 
transition in SNO. 

The water-treated thin films can be brought back to the low- 
resistivity state by the application of a reverse bias (Fig. 1d and Extended 
Data Fig. 1a, purple curve), indicating their capability to detect local 
fluctuations of an electric potential in water. We find that the electrical 
resistivity of SNO can change consistently following the application 
of a bias potential ranging from -£0.5 V to +0.005 V over multiple 
cycles (Extended Data Fig. 2c). Figure 1g shows the modulation of the 
electrical resistance of SNO when a bias potential down to the level 
of millivolts is applied to evaluate the measurement sensitivity of 
SNO. The sensitivity of our SNO device can be extrapolated to show a 
microvolt-level detection ability in oceans, which is enabled by high- 
resolution resistance measurements (see Methods) in the entire range 
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Bias potential (V) 


electrical resistivity curve (purple) is almost recovered to its original state. 
e, Electrical resistivity and colour change of SNO thin films after applying 
various bias potentials in a 0.6 M NaCl solution for 10s. f, Cross-sectional 
conducting AFM images of SNO after bias treatment, showing reduced 
current owing to the phase change. g, The experimentally observed change 
in the electrical resistance (AR) for bias potentials from 0.5 V to 5mV 
(Extended Data Fig. 2c). This measurement range spans all maritime 
vessels and several marine animals, as marked. The resistance change 
beyond the present measurement window (<5 mV) is estimated by linear 
extrapolation. With high-resolution resistance measurements (100 nQ; 
see Methods), the SNO device would be sensitive down to about 4.5 1V, 
extending to even smaller marine animals'!~!°. Error bars show standard 
deviations. UUV, unmanned underwater vehicle. 


of bioelectric potentials generated by numerous marine species up to 
galvanic potentials from ships and unmanned underwater vehicles'!"'®. 

The sensing mechanism of SNO is analogous to the electroreception 
organs of elasmobranch species such as sharks, rays and skates: the 
ampullae of Lorenzini!’~'®. These ampullae are located around the 
mouths of sharks”°; the distinctive structure of a single ampulla is 
shown schematically in Extended Data Fig. 2d. The jelly inside the 
ampulla, which has excellent proton conductivity”! and enables thermal 
sensing”’, conducts ions from the nearby sea water to the membrane 
located at the bottom of the ampulla. The membrane contains sensing 
cells that react to an electric potential applied across them (Extended 
Data Fig. 2d). Under electric bias, ionic channels on the apical side of 
the sensing cells open and allow a flux of charged ions, which causes 
the sensing cell to release neurotransmitters to synapses at the bottom, 
informing the brain!”?°. Thus, the ampullae of Lorenzini enable these 
sharks to detect bioelectric fields emitted by prey fish'®. This suggests 
an analogy between the nickelate sensor and the electroreception organ 
of sharks. We calculated the detection distance of SNO and found a 
similar length scale to what has been reported for the elasmobranch 
electroreceptors (Extended Data Fig. 2e). Furthermore, the response 
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Figure 2 | Mechanism of electric-field sensing 
in perovskite nickelate. a, Comparison of 

in situ synchrotron XRR curves for pristine and 
water-treated SNO thin films after applying 
bias for 3 min and 9 min successively. The 

inset shows a magnified area of the XRR 

curves normalized to the oscillation peak 

for a scattering vector Q~0.16 A~! (a.u., 
arbitrary units). b, Neutron reflectometry 

data, error bars and associated fits for pristine, 
hydrogenated and deuterated SNO thin films. 
Error bars represent one standard error. The 
inset shows a magnified area comparing 
oscillations normalized to the peak at 


Neutron reflectivity (a.u.) 


0.10 0.15 0.20 ° 
a) Q~0.03 A“!. c, X-ray absorption curves of 
f, the Ni L3 edge of pristine and water-treated 
SNO, compared with that”* of NiO. d, X-ray 
absorption curves of the O K edge of pristine 
: and water-treated SNO. e, Optical transmission 
g —n, HSNO (H,) spectra of water-treated SNO, showing 100% 
3 —n, HSNO (water) increased transmissivity in the near-infrared. 
a f, Comparison of the real (n) and imaginary 
8 (k) parts of the refractive indices of water- 
& treated SNO and dry H)-gas-treated SNO*’. 
1). HSNO (Hy) 
----k, HSNO (water) 
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time of our SNO devices is in the same range as that of the elasmo- 
branch electroreceptors (Extended Data Fig. 2f). 

Charge transfer was observed during the water-mediated phase tran- 
sition of SNO, as expected. A cathodic current peak (Extended Data 
Fig. 4a, b) appears in cyclic voltammograms of SNO at negative poten- 
tials, indicating the occurrence of a reduction reaction. The magnitude 
of the electric potential needed to trigger the reduction reaction of SNO 
increases with increasing pH value (Extended Data Fig. 4a), indicating 
the close relationship between water-mediated conductance modu- 
lation and the activity of protons in aqueous solutions. Fitting of the 
scan rate dependence of the cathodic current peak with the Randles- 
Sevcik equation” indicates that Ni** in SNO is almost fully reduced 
to Ni** after the reaction (Extended Data Fig. 4c and Supplementary 
Information section 4). 

To investigate the microscopic mechanism of environmental sensing, 
in situ synchrotron X-ray reflectivity (XRR) measurements (Extended 
Data Fig. 5) were performed on SNO submerged in a 0.01 M KOH/ 
water solution. Upon applying a bias potential of —1.5 V (Fig. 2a), the 
XRR results show a noticeable decrease in their oscillation period with 
increasing duration of the applied potential (Fig. 2a inset), indicating 
a substantial expansion of film thickness**. Synchrotron XRR inves- 
tigations (Extended Data Fig. 6) show that such expansion occurs 
regardless of the solution type and is attributed to the increased lattice 
constant of SNO after the treatment. 

Although it is extremely challenging to detect protons directly in 
solids, neutron scattering is among the best available techniques and 
is also sensitive to hydrogen isotopes. Neutron reflectivity curves 
for SNO are shown in Fig. 2b for the pristine film and films treated 
in DO (deuterated ‘heavy’ water) and H2O solutions. The decrease 
in the oscillation period of the neutron reflectivity curves after 
H,0 or D,O treatment (Fig. 2b inset) corresponds to a film expan- 
sion of about 6.9% and a decrease in film density (Supplementary 
Information section 5). The fitted neutron scattering length density 
profiles (Extended Data Fig. 7) show a considerable increase in the 
SNO region when H,0O is replaced with D2O. This is consistent with 
the larger neutron scattering cross-section of D* relative to H*. The 
isotopic substitution results therefore show clearly the intercalation 
and transport of H* (or D*) from the solution to SNO, which is 


4 8 12 
Wavelength (tum) 


similar to the ion transfer observed in the membranes of ampullae 
of Lorenzini. 

Synchrotron X-ray absorption spectroscopy (XAS) measurements 
were carried out near the Ni L3 edge and the O K edge of water-treated 
SNO. As Fig. 2c shows, with proton intercalation, the Ni L3-edge 
absorption peak shifts its weight from 855 eV to 853 eV, similarly to 
the absorption edge seen”® in NiO, indicating that Ni exhibits diva- 
lency after water treatment. Because the covalent nature of the Ni-O 
bond in SNO, where the electronic configuration of the ground state is 
dominated by the 3d°L state (where L denotes the existence of a ligand 
hole in the O 2p orbital)”°°, we further studied the evolution of the 
O K edge after the water treatment. As Fig. 2d shows, suppression of the 
O K-edge absorption peak at 529 eV occurs after the treatment, indicating 
a reduced oxygen-projected density of unoccupied states caused by 
doping-induced band filling. Therefore, the XAS results of both the 
Ni L3 edge and the O K edge demonstrate the formation of Ni?* with 
the 34° configuration, resulting from charge transfer associated with 
proton intercalation after the water treatment. The charge transfer 
noted here is expected from the cyclic voltammetry observations. 

Optical spectra of water-treated SNO were measured for further 
comparison with gas-phase proton-doped samples (that is, samples 
never exposed to water). As Fig. 2e shows, the transmissivity of water- 
treated SNO in the near- and mid-infrared frequency range increases 
compared to that of pristine SNO. As a result, SNO becomes more 
transparent in the infrared and visible wavelengths (Extended Data 
Fig. 8d). The reflectivity and absorptivity (Extended Data Fig. 8a, b) 
decrease concomitantly in water-treated SNO owing to the localiza- 
tion of charge carriers and opening of the optical bandgap. The real 
and imaginary parts ( and k, respectively) of the refractive index 
of water-treated SNO (Fig. 2f), which were calculated using transfer 
matrix formalism (Supplementary Information section 6), are in 
good agreement with those of HSNO treated with dry H) gas, without 
exposure to water?’. This conclusion is further supported by finite- 
difference time-domain simulations of the optical spectra of water- 
treated SNO (Extended Data Fig. 8c). These results prove that the 
primary mechanism of sensing is proton intercalation into the SNO 
lattice, and not corrosion or degradation of the perovskite. The overall 
reaction taking place in SNO during electric potential sensing in an 
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Figure 3 | First-principles calculations of SNO-water interaction and 
HSNO. a, AIMD simulations of water-mediated protonation of a SNO 
surface at 300 K. The top images show the evolution of a representative 
water molecule and the NiOg octahedra in the SNO layer closest to water. 
b, Surface stability of SNO, characterized by the Ni-O pair distribution 
function as a function of the separation distance r at various time intervals. 
c, Energy landscape and atomic-scale pathway of intercalation of surface 
protons into the SNO lattice. The potential energy is shown along the 
most preferred migration pathway, together with selected configurations 
along this pathway (labelled as I,-I,). d, First-principles calculation of 


aqueous environment is therefore Fl sauesas + SNOgolia + @ <<? HSN 
Osolias Where the valence of Ni ions is reduced from Ni(11) in SNO to 
Ni(i1) in HSNO. 

Ab initio molecular dynamics (AIMD) simulations were carried 
out to study the underlying atomic-scale mechanisms of the water- 
mediated phase transition in SNO. As Fig. 3a shows, the water proximal 
to the SNO surface dissociates into a free proton and OH’. The proton 
migrates to the oxide/water interface and binds to a surface oxygen 
atom (Supplementary Video 1). We find an increased uptake of protons 
by the surface oxygen atoms of SNO, which has little impact on the 
structural integrity of the oxide interface (Fig. 3a, t= 10 ps). The pro- 
tonation of the SNO surface and its structural stability in water are also 
observed at elevated temperature (500 K; Extended Data Fig. 9a, b and 
Supplementary Video 2) and with either excess OH radicals or excess 
protons. Besides primary surface events, such as proton migration and 
binding, the AIMD trajectories reveal that the OH (arising from water 
dissociation) can bind to the under-coordinated Ni atoms (fewer than 


4 | NATURE | VOL 000 | 00 MONTH 2017 


d : 0 H/SNO 


+ 
I 
a 
° 
mo) 
n 
s 


Pair distribution function (a.u.) 


1 1 
ZH/SNO ZH/SNO $H/SNO  1H/SNO 


Unoccupied 
Sm states 


E-E, (eV) 

ine} 
Unoccupied 
Ni-O states 


\ 
iA 
4 
Occupied 
Ni-O states 


Density of states 


333333 d808 


QH/SNO) ZF 7 H/SNO - 


a H/SNO 


a $ H/SNO 1 H/SNO 


eiseenean SNO. The top panel shows the total density of 
states (grey) as a function of the difference between the energy (E) and the 
Fermi energy (Ef), with 0-1 added H atoms per SNO formula unit. The 
unoccupied projected density of states on each Ni site and the Ni projected 
density of states associated with localized electrons are shown in orange 
and purple, respectively, corresponding to inequivalent Ni sites. The lower 
panel shows a schematic of the occupied Ni eg levels for each scenario. The 
darker hues indicate Ni sites with two occupied eg states and the colours 
correspond to those in the upper panel. 


six O nearest neighbours) on the surface and restore the NiO, octahedra 
(Fig. 3a, t= 10 ps), which further improves the surface stability. The 
Ni-O pair distribution functions (Fig. 3b and Extended Data Fig. 9c) 
remain sharp and well defined. The peaks in the pair distribution func- 
tions can be resolved even at long distances (above 5 A), indicating that 
the long-range structural order in SNO is preserved. The stability of the 
SNO surface stems in part from the high vacancy formation energies 
in pristine SNO; for example, the oxygen vacancy formation energy in 
SNO, as obtained from our density functional theory (DFT) calcula- 
tions, is 2.95 eV, more than three times higher than the energy barrier 
for proton intercalation (0.9 eV in Fig. 3c). 

Nudged elastic band calculations in the framework of DFT were 
performed to estimate the energy landscape and to identify the ener- 
getically preferred pathways for H* intercalation into SNO (Fig. 3c). 
Initially, the proton is bound to a surface oxygen atom O1 (image I, in 
Fig. 3c) at a distance of about 3.7 A from atom O2, where atom O1 is 
the shared corner of two NiO, octahedra centred at nickel atoms Nil 
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and Ni2. The Nil-O1-Ni2 bond angle is about 145° and the length of 
the Ni2-O2 bond is approximately 2.5 A, similar to that of bulk SNO. 
The proton first rotates about O1 while being bound to it, so that it 
enters the sub-surface layer of the SNO slab and reduces the O2-H 
distance to about 2.6A (image I, in Fig. 3c). This rotation of the O1-H 
bond distorts the surface layer considerably, which manifests itself in an 
increased Ni2-O2 distance of about 3.43 A and a substantial change of 
the Nil-O1-Ni2 angle (about 153°). This surface distortion process is 
associated with a barrier of about 0.9 eV. Further rotation of the O1-H 
bond brings the proton close to 02 (O02-H distance of approximately 
1.5 A; image I; in Fig. 3c) while increasing the inter-octahedral angle 
Nil-O1-Ni2 to about 175°, and leads to the concurrent healing of 
the Ni2-O2 bond (image I, in Fig. 3c), which thus preserves the SNO 
framework upon intake of protons. Once the proton is intercalated 
into the SNO lattice, intra-octahedral proton hopping occurs even at 
room temperature, as seen in the ab initio simulations (Supplementary 
Video 3), indicating facile proton diffusion within SNO. The calcu- 
lated energy barrier associated with the migration of protons within 
the SNO lattice is about 0.27 eV, which is low compared to those of 
other proton-conducting oxides” (0.4-0.6 eV). Volume expansion 
is observed in simulations after proton intercalation (Extended Data 
Fig. 10a—g and Supplementary Information section 7), which is in 
agreement with reflectometry measurements. 

Density functional theory calculations (Fig. 3d) show that as each 
hydrogen atom is added to the supercell, its electron is transferred to a 
previously unoccupied Ni-O orbital of pristine SNO. The transfer of the 
electron can also be seen in the difference between the charge density 
of the combined HSNO system and that of the sum of SNO and H, 
which shows charge depletion around the H and a corresponding 
charge accumulation around the adjacent Ni and O (Extended Data 
Fig. 10h, i). Owing to electron-electron correlations, this state is 
shifted down into the valence band, while the remaining unoccupied 
states of that Ni are shifted up in energy. The bandgap remains almost 
unchanged until an electron has been added to each Ni atom. Both 
of these observations are consistent with the experimentally observed 
changes in X-ray absorption spectra and the increase in the electrical 
resistance of SNO that enables effective sensing. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 

Synthesis of SmNiO3 thin films. SmNiO; (SNO) was synthesized by physical 
vapour deposition and subsequent ultrahigh-pressure annealing in pure oxygen. 
Substrates were cleaned with acetone and isopropanol, followed by compressed 
Ar drying. SNO thin films were deposited on the substrates by magnetron 
co-sputtering of Sm and Ni targets at room temperature in an Ar/O; mixture 
at 0.67 Pa. To obtain the appropriate stoichiometric ratio, Sm was deposited at 
160 W (radiofrequency sputtering) and Ni at 80 W (direct-current sputtering). 
The stoichiometric ratio of Sm and Ni was analysed using energy-dispersive X-ray 
spectroscopy. The substrates were rotated during deposition to ensure composition 
homogeneity. The deposited samples were then annealed for 24h at 500°C under 
high-pressure O, (107 Pa) in a home-built vessel to form the perovskite phase. Both 
epitaxial and polycrystalline SNO thin films were used in this work to demonstrate 
the generality of the water-mediated phase transition in SNO. Epitaxial SNO thin 
films were obtained on single-crystalline LaAlO; (001) substrates, while polycrys- 
talline SNO thin films were grown on Si (100) wafers. 

Aqueous solution preparation. To mimic the salinity of seawater, a 0.6 M NaCl 
aqueous solution was prepared by dissolving reagent-grade NaCl into micropure 
(18.2 MQ. cm) water, which has electrical conductivity comparable to that of 
seawater (5S m_!). Experiments were performed at ambient temperature, unless 
otherwise noted. Additional aqueous solutions with a much wider pH range 
than seawater were studied, including standard buffers (pH = 4.0, 7.0, 10.0), 
a weakly acidic solution containing no salt (pH =2.7, 0.01 M citric acid in HO), 
and a weakly basic solution (pH = 12.0, 0.01 M KOH in H20) (Supplementary 
Information section 2). These aqueous environments were designed to cover the 
wide ranges of temperature and pH found across Earth’s oceans. 

Sensing experiments in water environments. SNO thin films were incorpo- 
rated into a three-terminal electrochemical cell. Platinum wire was bonded on 
the thin films with silver paste, and polyethylene masks (Gamry) were used to 
expose selected areas of SNO. The SNO film was then submerged into aqueous 
solutions and connected as a working electrode. The counter-electrode was a 
graphite rod with large surface area. A Ag/AgCl (saturated KCl) reference electrode 
was used to control and modulate the electric potential applied. A static electric 
potential was applied to SNO by a potentiostat. Cyclic voltammetry was performed 
on SNO samples with identical three-terminal configuration. The sensing and 
electrochemical tests were performed with a Solartron 1260A electrochemical 
analyser. 

Electrical measurements. After the water-based treatment, the SNO samples were 
removed from the aqueous solution, rinsed with deionized water, and dried with 
argon gas. Contact electrodes (Pt) were patterned on the treated area. The SNO 
thin films were then transferred to a controlled-temperature probe station and their 
electrical resistance was measured with a Keithley 2635A source meter. 

An ohmmeter with sensitivity of 100 nQ is commercially available. Higher 
sensitivity can be obtained by utilizing a lock-in amplifier in the amplitude- or 
phase-sensitive mode, which is routinely used in low-temperature physics research. 
Conducting AFM measurements. To observe the propagation of the HSNO phase 
after sensing electric potential in water, cross-sectional conducting AFM measure- 
ments were carried out on SNO films (about 500 nm thickness) grown on Si (100) 
substrates. The SNO was treated at a bias potential of —2.0 V (versus Ag/AgCl) for 
30 min in a 0.6 M NaCl aqueous solution. After the treatment, the sample was verti- 
cally mounted in epoxy resin. The cross-sectional surface subsequently underwent 
multiple mechanical polishing steps, with the final polish using a 1-j:m-diameter 
diamond suspension. The opposite cross-sectional surface was coated with silver 
paste to form the bottom electrode. The conducting AFM cross-sectional imaging 
was performed through a Pt/Ir-coated tip (Arrow-CONTPt, Nanoworld; force 
constant 0.2N m~') connected to a dual-gain transimpedance amplifier (ORCA) 
in a commercial system (Oxford Instruments/Asylum Research Cypher ES). 
Topographic and current images were collected simultaneously with a bias of 5.0 V 
applied to the bottom Ag electrode, a setpoint of 0.06 V and a scanning rate of 1 Hz. 
Additional conducting AFM measurements of the top surface of a water-treated 
SNO sample were conducted with an Asylum MFP3D stand-alone atomic force 
microscope using Asylum ASYELEC-01 conductive tips (Si coated with Ti/Ir). 
The AFM tip was grounded and a bias of 1.0 V was applied to the sample surface. 
A resistor of 1 MQ) was connected in series to the SNO sample to limit the current 
and protect the conducting AFM tips. The current was amplified using current 
amplifiers (dual-gain, ORCA) with a sensitivity of 1 V pA’ and 1 V nA~!. The 
scanning rate was 1 Hz. For the top-surface conducting AFM measurement, a SNO 
thin film (70 nm) was grown ona Si (100) substrate. A selected area of the sample 
was treated at a bias potential of —4.0 V (versus Ag/AgCl) in a 0.6 M NaCl aqueous 
solution for 10s. 

X-ray reflection and diffraction measurements. Synchrotron XRR and X-ray 
diffraction measurements of the SNO samples were carried out on a five-circle 


diffractometer with x -circle geometry (in which the sample can be rotated around 
the centre of the diffractometer), using an X-ray energy of 20 keV (wavelength 
= 0.6197 A) at beamline 12-ID-D of the Advanced Photon Source of Argonne 
National Laboratory. The X-ray beam had a total flux of 4.0 x 10!” photons s~! and 
was vertically focused by beryllium-compound refractive lenses to a beam profile 
below 50m. Scans along the Q, and L directions of the HKL reciprocal space 
were obtained by subtracting the diffuse background contributions using the two- 
dimensional images acquired with a two-dimensional pixel array detector (Dectris 
PILATUS 100K, with a 1-mm-thick Si sensor chip and 10° pixels). Additional X-ray 
diffraction measurements over a wide range of scattering angles were carried out 
using the PANalytical MRD X’Pert Pro diffractometer with Cu Ka X-rays (wave- 
length \= 1.5418 A). For in situ XRR measurements, epitaxial SNO samples with 
thicknesses of 70nm were grown on a LaA1O; (001) substrate. Each sample was 
attached to an electrochemical cell (Extended Data Fig. 5) filled with a 0.01 M KOH 
aqueous solution. The XRR data of SNO were measured in situ after applying a bias 
potential of —1.5 V (versus Ag/AgCl) for 3 min and 9 min. Additional ex situ XRR 
measurements of SNO in various aqueous solutions were carried out. The samples 
were treated separately in aqueous solutions of 0.01 M citric acid and 0.01 M KOH 
by applying the same bias potential of —4.0 V (versus Ag/AgCl) for 5 min, and 
X-ray diffraction measurements were carried out after the latter treatment. 
Neutron reflectivity measurements and heavy water studies. Neutron reflecto- 
metry was performed at the Center for Neutron Research of the National Institute 
of Standards and Technology using the MAGIK reflectometer”® in air with 
procedures similar to those described in ref. 30. The samples were characterized in 
the neutron beam over the Q range 0-0.18 A~'. The neutron reflectivity data were 
fitted with the NIST Reflld software package (http://www.ncnr.nist.gov/reflpak). 
For the isotope substitution measurement, a SNO sample with thickness of about 
70 nm was grown on a Si (100) substrate. The sample was cleaved into two pieces. 
The one piece was first characterized at the pristine state as a reference and was 
then treated at —4.0 V (versus Ag/AgCl) for 30s in a 0.01 M KOH/H;0 solution. 
To observe the contrast from isotope substitution, the other piece of SNO was 
treated at —4.0 V (versus Ag/AgCl) for 30s in a 0.01 M KOH/D,0 solution. After 
treatment, the samples were cleaned in isopropanol and dried in ambient condi- 
tions before the measurements. 

Certain commercial equipment, instruments or materials are identified in this 
paper to foster understanding. Such identification does not imply recommenda- 
tion or endorsement by the National Institute of Standards and Technology, nor 
does it imply that the materials or equipment mentioned are necessarily the best 
available for the purpose. 

X-ray absorption spectroscopy. The XAS were measured at beamline 10ID-2 
(REIXS) of the Canadian Light Source. The absorption near the Ni L3 and O K 
edges was determined from the total fluorescence yield obtained with linearly 
polarized photons. The samples were placed in normal-incidence geometry with 
the electric field vector parallel to the (110) direction in a pseudocubic coordinate 
system. All spectra were measured at 20 K. For the X-ray absorption measurements, 
SNO samples with thickness of 70 nm were grown epitaxially on LaAlO; (001) 
substrates. Treatment was carried out in a 0.01 M KOH aqueous solution under a 
bias potential of —4.0 V (versus Ag/AgCl) for 30s. After the treatment, the samples 
were rinsed with deionized water and dried with argon gas. 

Optical spectra measurements. Reflection optical spectra in the near- and mid- 
infrared were measured using a Fourier transform infrared spectrometer and a 
mid-infrared microscope. For spectroscopic measurements of the transmission 
spectra, the samples were mounted in front of the opening of a gold integrating 
sphere, which captured both the direct and the diffused transmission of the 
samples. The signal was measured by a mercury cadmium telluride detector 
attached to the integrating sphere. The optical refractive indices were calculated 
by transfer matrix formalism (Supplementary Information section 6). To take an 
infrared image, a tunable mid-infrared quantum-cascade laser was used as the 
light source and irradiated the sample at a wavelength of 81m. For optical spectra 
measurements, SNO samples with thickness of 70 nm were grown on Si (100) 
substrates. The optical properties of Si substrates are well known, enabling us to 
quantitatively analyse the optical properties of SNO grown on Si. For infrared 
imaging, SNO samples with a thickness of 70 nm were grown on LaAlO; (001) 
substrates. Treatment was carried out in a 0.01 M KOH aqueous solution under a 
bias potential of —4.0 V (versus Ag/AgCl) for 30s. 

AIMD simulations of SmNiO3-water interactions. AIMD simulations were 
performed with the Argonne Leadership Computing Facility supercomputers 
(2048 cores) using the generalized gradient approximation (GGA). The Hubbard 
correction to treat electron localization for Ni atoms used the projector- 
augmented wave formalism, as implemented in the Vienna Ab initio Simulation 
Package (VASP)*!°*. The computational supercell consisted of a monoclinic 
SNO slab (160 atoms) with the surface normal pointing along the orthorhombic 
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crystallographic direction [110]. Periodic boundary conditions were employed 
along all directions with a vacuum of about 10 A along the surface normal. For 
the SNO/water simulations, this vacuum was filled with 18 water molecules to 
simulate the experimental water density (1 g cm *). The exchange correlation was 
described by the Perdew-Burke-Ernzerhof functional*?, with the pseudopotentials 
Sm_3 (valence 5s” 5p” 6s” 4f), Ni_pv (valence 3p° 4s” 34°) and O (valence 2s” 2p’) 
supplied by VASP. The plane-wave energy cut-off was set at 520 eV. The Brillouin 
zone was sampled at the [’-point only. Using AIMD simulations in the isobaric- 
isothermal ensemble, we first thermalized the SNO (110) computational supercell 
at various temperatures ranging from 300 K to 500 K and zero external pressure 
for 10 ps using a time step of 0.5 fs. During these simulations, the cell volume, 
cell shape and atomic positions were allowed to vary via the Parrinello-Rahman 
scheme*. The temperature conditions were maintained by using a Langevin 
thermostat. Next, we inserted the water molecules in the vacuum (at a given 
temperature). The subsequent AIMD simulations were performed in the canonical 
ensemble (constant volume and temperature). Constant temperature conditions 
were maintained via a Nosé—Hoover thermostat**, as implemented in VASP. To 
identify the activation barriers and minimum energy paths for H intercalation into 
a SNO (110) slab, we employed the climbing image nudged elastic band method 
within the GGA + U formalism*®, where U is the on-site Coulomb parameter. 
The diffusion coefficient (D) of protons in bulk SNO at 300 K was estimated using 
the Einstein relationship D=[r(t) - r(0)]°/(6t), where [r(t) — r(0)]* is the mean- 
square displacement of a proton at time t with respect to the time origin (f= 0); the 
value of D was averaged over various time domains (each of duration 0.5 ps) over 
the AIMD trajectory. 

First-principles electronic structure calculations of HSNO. First-principles 
calculations were carried out within the DFT + U approximation with the VASP 
code?! using the projector augmented plane-wave method of DFT°’ and the 
pseudopotentials Sm_3 (valence 5s? 5p2 6s? 4f'), Ni_pv (valence 3p6 4s” 3d‘), 
O (valence 2s? 2p’) and H (valence 1s!).To treat the exchange and correlation, the 
Perdew-Burke-Ernzerhof functional was used within the GGA* and the rota- 
tionally invariant form of DFT + U of ref. 34 with U=4.6 eV and J=0.6 eV, where 
J is the on-site exchange parameter. For the structural determination of pristine 
SNO, we started with the Materials Project structure*’, added a small monoclinic 
distortion (G~ 90.75) and allowed the cell and ionic positions to relax until the 
forces on each ion were lower than 0.005eV A~'. All calculations were carried out 
with the tetrahedral method with Bléchl corrections*”, a 6 x 6 x 4 Monkhorst-Pack 
k-point mesh for the /2 x 2 x 2 supercell, and a plane-wave energy cut-off of 
500 eV. To determine the structure of HSNO, we began with Ho 2s5SmNiOs, adding 
one hydrogen atom at various locations to the pristine SNO /2 x 2 x 2 super- 
cell with G-type magnetic ordering, and allowed the internal coordinates to relax 
with the same tolerance as described above. The structure with the lowest energy 
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was chosen as the Ho 2;SmNiO; structure. Taking the symmetry-equivalent site of 
the relaxed hydrogen position, we constructed structures for Ho 5_;SmNiOs3, again 
allowing the internal coordinates to relax (Supplementary Information section 7). 
We compared the results while keeping the total volume of the cell fixed and 
relaxing the volume and the [110] direction only, where the qualitative features of 
the electronic structure were not affected. 

Data availability. The data that support the findings of this study are available 
from the corresponding author upon reasonable request. 
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Extended Data Figure 1 | Electrical properties of water-treated 

SNO. a, Temperature derivative of the electrical resistivity of SNO after 
submersion in a 0.6 M NaCl aqueous solution for 24h (blue curve). The 
insulator—metal transition temperature (Tyr), where dp/dT changes 

sign from negative to positive for submerged SNO is in the same range as 
reported in the literature*”*”*'. The purple curve shows the temperature 
derivative of the electrical resistivity of SNO obtained after applying a 
reverse bias of 2.0 V for 10 min to a water-treated HSNO sample, where the 
metal-insulator transition recovers. b, Electrical resistivity of SNO after 


0 20 40 60 80 100 120 140 160 180 0 20 40 60 80 100 120 
Time (min.) 


Time (min.) 


being submerged in solutions of 0.01 M KOH and 0.01 M citric acid. The 
electrical resistivity of SNO shows minimal variation over a wide range of 
pH values for 180 min. c, Non-volatile behaviour of SNO thin film after 
applying a bias of —2.0 V in a 0.6 M NaCl solution for various durations. 
The resistivity of SNO after sensing an electric potential remains 
unchanged for 120 min, which demonstrates its non-volatile nature, in 
contrast to the surface electrostatic field effect of electric double-layer 
transistors. 


© 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. 


» 
tox 


0.20 5.2 
5.0 
S 015+ = 
5 5 4.8 
2 Ocean pH g 
D 0.10} | 246 
g 
3 Baa 
> 0.054 g 
4.2 
1 1 1 n i + 1 + 1 rt 1 1 1 
or Ss 6 7 & 8 ao “SS 1015 20.25 30 35 40 0 2 4 6 8 10S 4 
pH Temperature (°C) Sensing step 
d e f 
Sphyraena barracuda (ref. 11) e -2.0V 
= -0.5V 
- -lonic conductive jelly =—100 a <. 
2 Ss 
Ampullae = “seg 
of Lorenzini 2 ae 
*s Z 2 Ariopsis felis (ref. 11) Tie 
© ~- 28 aco 
oe ae > 
Sensing Support eae 
cell cell 
10 20 30 40 50 60 70 80 }o 1 0.1 


Distance from mouth (cm) 


Extended Data Figure 2 | pH, temperature and electric potential sensing 
of SNO. a, Open-circuit potential (Voc) of SNO relative to a standard 
Ag/AgCl electrode in standard aqueous buffers with pH values covering 
the pH range of Earth’s oceans”. Error bars show the standard deviation. 
The potential Voc decreases monotonically with increasing pH. This 
linear relationship between proton activity (and the corresponding 

surface adsorption) and Voc enables SNO to operate as a pH sensor. 

b, Temperature-dependent electrical resistivity of SNO in the 

temperature range of Earth’s oceans*’. The electrical resistivity increases 
with cooling; this is consistent with the insulating nature of SNO 

around room temperature, which enables it to function as a thermistor. 

c, Modulation of normalized electrical resistivity of SNO in an aqueous 
environment after the application of bias potentials over multiple sensing 
steps. The bias potentials (versus Ag/AgCl) were +0.5 V, £0.05 V and 
+0.005 V and their duration was 10s. The aqueous environment was a 
0.6 M NaCl solution with salinity close to that of sea water. The normalized 
resistivity increases and then decreases following the reversal of the bias 
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potential. The reversibility of the water-mediated phase transition and 
the facile migration of protons enable SNO to detect the local fluctuation 
of electric signals in water. This sensing capability persists over multiple 
cycles, indicating their robustness in aqueous environments. d, Schematic 
of an ampulla of Lorenzini, an electroreception organ located around the 
mouth of sharks. e, Electric potential as a function of distance for teleost 
fishes (Sphyraena barracuda and Ariopsis felis)'’. The detection range 

of elasmobranch predators!! and SNO sensors are shaded with blue and 
yellow colour, respectively. The calculated detection range of SNO includes 
the regime where the bioelectric potential of prey fishes is higher than the 
sensitivity of SNO (about 4.5 \1V) experimentally determined from 

Fig. 1g. The nickelate device is estimated to detect field stimuli over 

a distance of tens of centimetres, which is similar in range to that of 
elasmobranch species. f, Experimentally measured resistance modulation 
of pristine SNO upon the application of pulsed bias potential at —2.0 V 
and —0.5 V respectively. The response times of the SNO sensor studied 
here are as low as 0.1 s. 
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Extended Data Figure 3 | Topography and current map of water-treated —_ after sensing is decreased to the picoampere range owing to proton uptake. 
SNO thin films. a, Optical image of SNO after applying bias to a selected e, The corresponding surface topography of SNO after sensing, where no 


area, where colour change occurs. b, Current map of pristine area, where evidence of corrosion was observed compared with the pristine state (c). 
the current is in the nanoampere range. c, The corresponding surface Moreover, almost no variance is observed in the surface roughness of the 
topography of the pristine area. d, Current map of water-treated area, thin film (Supplementary Information section 3). Scale bars are 0.5 jum. 


which is entirely dark compared with the pristine state (b). The current 
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Extended Data Figure 4 | Cyclic voltammograms of SNO thin films. 

a, Dependence of water-mediated phase transition in SNO on pH values 
spanning from an acidic solution (0.01 M citric acid, pH = 2.7) to a basic 
solution (0.01 M KOH, pH = 12). The transition from SNO to HSNO shifts 
to more negative potential values with increasing pH, where greater bias 

is required to compensate for the reduction of the proton activity in the 
basic solutions. b, Cyclic voltammogram and accompanying reaction for 
SNO in 0.01 M citric acid from 1.0 V to —1.0 V (versus Ag/AgCl) at various 
scan rates. Cathodic current peaks at negative potentials indicate the 


(Scan rate)’> (v s1)°® 


charge transfer as the Ni** is reduced to Ni". The peak position varies as 
a function of scan rate, indicating that the reaction is kinetically limited by 
the charge and mass transfer. c, Linear relationship between peak cathodic 
current density (I,/A) and the square root of the scan rate (v°°). The best 
fit to the Randles-Sevcik equation’ estimates the number of electrons 
transferred in the rate-limiting step as 0.95 (Supplementary Information 
section 4), indicating that the Ni in SNO is almost fully reduced from Ni?* 
to Ni?* upon intercalation. 
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Extended Data Figure 5 | A schematic of the experimental setup for of electrolyte during measurement. The electric potential was applied 
in situ XRR measurement at the Advanced Photon Source. The SNO through the counter-electrode. After the treatment, the XRR signals were 


thin film was connected to a working electrode and submerged ina0.01M collected in situ. 
KOH aqueous solution. A Kapton film was used to avoid the spillage 
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Extended Data Figure 6 | X-ray diffraction and X-ray reflectivity 

of water-treated SNO. a, Synchrotron X-ray diffraction curves taken 
from a SNO/LaA1O; thin film after treatment in a 0.01 M KOH aqueous 
solution at —4.0 V for 30s. The (220) peak of pristine SNO (orthorhombic 
notation) appears at Q) + 3.29 A~' asa shoulder with slightly lower 
scattering vector Q, than the LaAlO3 (002) diffraction peak (pseudocubic 
notation), demonstrating the epitaxial growth of SNO on LaAlO3. After 
the water treatment, the epitaxial relationship of SNO on LaA1O; is 
preserved. Peak 1 shifts to a lower Q,. Peak 2 appears at Q,=3.11A7}, 
which corresponds to increase of the lattice constant by 5.7%. LAO stands 
for LaAlO3. b, X-ray diffraction profiles of SNO and water-treated SNO 
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over a wide range of scattering angles 20. No new peaks appear, in contrast 
to what has been observed in other oxides, such as cobaltites, upon 
exposure to water. c, Comparison of synchrotron XRR curves for SNO 
after applying a bias of —4.0 V for 5 min in 0.01 M citric acid and 0.01M 
KOH aqueous solutions. d, A selected area of the XRR curves, normalized 
to the oscillation peak at Q~0.19 A-! (marked by black arrows in c). 
Upon treatment, the XRR oscillation period decreases, demonstrating 
film expansion regardless of solution type, which indicates a general 
mechanism of phase change of SNO in various aqueous solutions caused 
by proton incorporation. 
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Extended Data Figure 7 | Neutron scattering length density profiles 

of heavy-water-treated SNO. The scattering length density (SLD) 

profiles were fitted to the data shown in Fig. 2b for the SNO/SiO./Si 

films. The surface roughness is nearly unchanged after water treatment 
(Supplementary Information section 3). The profiles of water-treated 

and heavy-water-treated samples show similar film expansion. However, 
differences exist between the scattering length densities of D,O- and H,O- 
treated films. Because D* has larger neutron scattering length than H™, the 
increase of the scattering length density demonstrates the intercalation of 
D* from D;0O into the lattice after the treatment. 
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Extended Data Figure 8 | Optical spectra of water-treated SNO. 

a, b, Reflectivity (a) and absorptivity (b) of pristine and water-treated 
(—4.0 V, 30s, in 0.01 M KOH aqueous solution) SNO thin film deposited 
ona Si substrate. After the treatment, the SNO sensing device shows 
reduction in both reflectivity and free-electron absorptivity, concurrent 
with a large increase in electrical resistance. c, Finite-difference time- 
domain simulation of optical spectra of water-treated SNO/SiO,/Si thin 
film devices. The experimental results of the transmissivity and reflectivity 
of water-treated SNO are compared with finite-difference time-domain 
simulation results of HSNO/SiO2/Si thin film devices, where the optical 
parameters of samples treated with gas-phase hydrogen”’ were adopted 
for HSNO. The good agreement between experimental and simulation 
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results indicates the occurrence of a phase transition from SNO to HSNO 
during water treatment with no material decomposition. The thickness 

of SNO and SiO; was obtained from neutron reflectivity data. The SiO» 
layer between the SNO thin film and Si, which is formed during film 
synthesis, contributes to the absorption feature observed at 9.2 1m in 

the transmission spectra. d, An infrared image of a SNO/LaAIO3 sample 
with water treatment on a selected area (FLIR, infrared camera). SNO 
becomes more transparent (red colour) in the infrared wavelength range at 
A = 8 «um after the treatment. The inset shows a photograph of the sample, 
where the transparency of the treated area can be observed in the visible 
wavelength range. 
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Extended Data Figure 9 | Dynamic simulations of SNO-water 
interactions at an elevated temperature of 500 K. a, Snapshots of the 
temporal evolution of a SNO surface submerged in water. Images tracking 
the evolution of a typical water molecule and the NiOg octahedra in 

the SNO layer closest to water are shown in the top panels. At 500K, 

the surface protonation mechanisms are identical to those at ambient 
temperature, where water at the SNO surface dissociates into free protons 
and OH ; a fraction of the free protons migrates to the oxide/water 
interface and binds to the surface oxygen of SNO. These atomic-scale 
processes observed in AIMD simulations support the proton accumulation 


r(A) 


and surface protonation mechanism depicted schematically in Fig. 1a. 

b, Top view, showing SNO protonation at the end of 4 ps. Compared with 
the pristine state at 0 ps, the SNO surface maintains structural stability 
during the protonation, even at 500 K (well above ocean temperature). 

c, The Ni-O pair distribution functions (PDF) calculated at various time 
intervals. The curves demonstrate well defined sharp peaks, suggesting 
that the SNO surface remains intact after surface protonation at elevated 
temperature in an aqueous environment. These results are consistent with 
the good stability inferred from the temperature-dependent electrical 
resistivity measurement of the submerged SNO samples. 
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Extended Data Figure 10 | First-principles calculations of the structure 
and electron localization of HSNO. a-f, The structure of SNO with 

1/4 H/SNO (a-c) and 1 H/SNO (d-f), displayed along the three 
crystallographic axes of the primitive perovskite structure. The 
crystallographic axes of the supercell were used in the calculations, where 
the [110] direction was allowed to relax. In all panels, 12 NiO, octahedra 
encompassing the Ni atoms (green) are displayed, with O in red, Sm in 
purple and H in cyan. The calculations use a/2 x ./2 x 2 supercell 

(that is, with four Ni atoms). g, Change in the volume of SNO at various 
protonation levels (denoted as protons per SNO formula unit) obtained 
from DFT calculations. The calculated volume expansion for 1 H/SNO is 
about 5.9%, which is close to the value obtained from neutron 
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reflectometry measurements and X-ray diffraction. h, The difference in 
the electron density between the relaxed HSNO (SmNiO3H) and the initial 
state (SNO + H), which clearly shows a depletion (cyan) of charge around 
the hydrogen (cyan) and an accumulation (yellow) of charge around the 
closest nickel (green) and oxygen (red), which are part of the octahedron 
that expands upon hydrogen incorporation into the lattice. In this 
calculation, the c axis was allowed to relax while the other two (in-plane 
lattice constants) were fixed. For clarity, only the spin-down charge density 
is plotted because the electron incorporation results in a negative total 
magnetic moment (see the projected density of states of 1/4 H/SNO in 

Fig. 3d). i, The (111) plane of the contour plot situated within the supercell. 
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Unexpectedly large impact of forest management 
and grazing on global vegetation biomass 


Karl-Heinz Erb!, Thomas Kastner!?*, Christoph Plutzar!**, Anna Liza S. Bais!, Nuno Carvalhais**, Tamara Fetzel!, 
Simone Gingrich!, Helmut Haberl', Christian Lauk!, Maria Niedertscheider', Julia Pongratz®, Martin Thurner”® & 


Sebastiaan Luyssaert? 


Carbon stocks in vegetation have a key role in the climate system’. 
However, the magnitude, patterns and uncertainties of carbon stocks 
and the effect of land use on the stocks remain poorly quantified. Here 
we show, using state-of-the-art datasets, that vegetation currently 
stores around 450 petagrams of carbon. In the hypothetical absence 
of land use, potential vegetation would store around 916 petagrams of 
carbon, under current climate conditions. This difference highlights 
the massive effect of land use on biomass stocks. Deforestation and 
other land-cover changes are responsible for 53-58% of the difference 
between current and potential biomass stocks. Land management 
effects (the biomass stock changes induced by land use within the 
same land cover) contribute 42-47%, but have been underestimated 
in the literature. Therefore, avoiding deforestation is necessary but 
not sufficient for mitigation of climate change. Our results imply 
that trade-offs exist between conserving carbon stocks on managed 
land and raising the contribution of biomass to raw material and 
energy supply for the mitigation of climate change. Efforts to raise 
biomass stocks are currently verifiable only in temperate forests, 
where their potential is limited. By contrast, large uncertainties 
hinder verification in the tropical forest, where the largest potential 
is located, pointing to challenges for the upcoming stocktaking 
exercises under the Paris agreement. 

The amount of carbon stored in terrestrial vegetation is a key compo- 
nent of the global carbon cycle*. Changes in carbon stored in vegetation 
biomass have a large effect on atmospheric CO, concentrations, due 
to either sequestering or release of carbon. The urgency to conserve 
and, where appropriate, enhance the carbon reservoirs of terrestrial 
vegetation has long been recognized and is reflected by, for example, 
the inclusion of the land sector in the report of the United Nations 
Framework Convention on Climate Change (UNFCCC), the program 
for Reducing Emissions from Deforestation and Forest Degradation 
(REDD-+), and the acknowledgement of biomass stocks as an essential 
climate variable’. Therefore, monitoring changes in biomass stocks is 
key for securing progress towards the commitment of halting global 
warming below 1.5°C. 

Although aboveground biomass stocks are straightforward to 
measure at the site level, their assessment at landscape-to-global 
scales is time consuming, costly and requires extrapolations’. Remote 
sensing is well-established for wall-to-wall mapping of biomass stocks, 
but the methodological differences between different remote-sensing 
products®* and their scale mismatch with ground data?"'! hamper 
their comparability. Consequently, and despite efforts to improve 
observational databases*, biomass stocks and their spatial distri- 
bution remain uncertain at the global scale (Extended Data Fig. 1). 


Many studies of global changes focus on changes in vegetation bio- 
mass without quantifying absolute amounts of biomass stocks”, Such 
approaches are indispensable for tracing the role of vegetation in the 
carbon cycle over time, but do not allow calculations of, for example, 
restoration potentials. Furthermore, large gaps in our knowledge 
remain concerning the impact of various land-use activities on biomass 
stocks), 

Informed design, implementation, monitoring and verification of 
land-based climate-change mitigation strategies require comprehensive 
and systematic stocktaking of the carbon stored in vegetation'*. Beyond 
accounts of carbon-stock changes, stocktaking also needs to consider 
the potential and actual biomass stocks of terrestrial vegetation; 
the full impact of land use on biomass stocks, that is, both land cover 
conversion and land management; and the uncertainty of biomass stock 
estimates. Here, we compile such information, complementary to cur- 
rent approaches that quantify actual biomass stocks®*!>-1° (Extended 
Data Fig. 2). 

We present seven global maps of the actual biomass stocks (Extended 
Data Fig. 3), here defined as the terrestrial, living, aboveground and 
belowground vegetation biomass measured in grams of carbon, 
based on remote sensing®* and inventory-derived information!>"®. 
Ecological literature on biomass stocks of natural zonal vegetation 
(Supplementary Tables 1, 2), and remote-sensing-derived information 
on natural vegetation remnants in ecozones, was combined with state- 
of-the-art biome maps (Methods), accounting for areas without vegeta- 
tion, to obtain six reconstructions of potential biomass stocks, defined 
as biomass stocks that would exist without human disturbance under 
current environmental conditions (Methods, Extended Data Fig. 4). 
Because actual and potential biomass stocks both refer to the same 
environmental conditions, their difference isolates the effect of land 
use on biomass stocks (Methods). 

Variation within both sets of maps was interpreted as an indicator 
of uncertainty, assuming that the uncertainty is the result of differ- 
ences between approaches rather than measurement errors within 
a single approach. From the variation between the seven actual 
biomass estimates, we calculated a detection-limit map for stock 
changes (Methods). Permuting potential and actual maps resulted in 
42 pairs, which enabled us to quantify the effects of land use on bio- 
mass stocks!”!8, Note that spatial variability in biomass stocks at the 
landscape level, for example, owing to age class structure, variation in 
soil fertility or soil-water availability, is accounted for differently in 
estimates of the potential and actual biomass stocks (Methods). This 
could introduce a bias of unknown sign and size when interpreting the 
fine-scale spatial patterns of the biomass-stock reduction maps. 
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Figure 1 | Differences in biomass stocks of potential and actual vegetation 
induced by land use. a, Spatial pattern of land-use-induced biomass stock 
differences (expressed as a percentage of potential biomass stocks), mean of all 
42 estimates. b, Box plot of all 42 estimates of global potential-actual biomass- 
stock difference. Whiskers indicate the range, the box shows the inner 

50% percentiles, the line indicates the median of all estimates; the two dots 
represent the results of the two approaches used for the attribution of biomass 
stock differences to land-cover conversion and land management. ¢c, Actual 
and potential biomass stocks in the world’s major biomes (see Extended Data 
Fig. 5f), and role of land-cover conversion and management in explaining 
their difference. Error bars indicate the range of the estimates for potential 
(grey; n=6) and actual (black; n=7) biomass stocks. ‘Ambiguous’ denotes 
cases attributed differently in the assessments based on FRA and ref. 16. 


Two of the actual biomass stock maps (based on the Global Forest 
Resource Assesment (FRA)!> and ref. 16) were established on the 
basis of a present-day land-use dataset (Methods) and therefore 
enabled the systematic separation of land-cover conversion effects, 
that is, change in the biomass stocks due to conversion of pristine 
ecosystems into artificial grassland, cropland or infrastructure; and 
land management effects, that is, management-induced changes that 
occur within unaltered land-cover types, such as forests, savannahs and 
other natural grasslands (Extended Data Fig. 2). 

At the global scale, the biomass stocks of the currently prevailing 
vegetation have a mean of 450 petagrams of carbon (PgC; range of 
the seven estimates: 380-536 PgC, coefficient of variation: 11%). 
By contrast, biomass stocks of potential vegetation have a mean of 
916 PgC (range of the six estimates, individually adjusted to actual 
biomass stock maps: 771-1,107 PgC, coefficient of variation: 12%). 
Therefore, our analysis suggests that land use halves the amount 
of carbon that is potentially stored in terrestrial biomass (Fig. 1). 
Irrespective of the climate zone, the difference in biomass between 
potential and actual stocks mostly follows the pattern of global agri- 
culture, with hotspots in South and East Asia, and Europe, as well as 
the eastern part of North and South America (Fig. 1a). Considerable 
differences between potential and actual biomass stocks also occur in 
regions dominated by forest and natural grassland use (Extended Data 
Fig. 5a, b). Given that biomass stocks are a function of net primary 
production and turnover time, a 50% reduction in the turnover time'® 
and a 10% land-use-induced decrease in net primary production” 
explains the reduced biomass stocks. 

The 42 pairs of potential-actual biomass-stock differences have a 
median of 49%, with the inner quantiles ranging from 43 to 55%, which 
implies an average impact on biomass stocks of 447 PgC (median; inner 
quartiles: 375-525 PgC; Fig. 1b). 


2 | NATURE | VOL 000 | 00 MONTH 2017 


Used boreal forest 
\ 
\ 
% 


@ ____— Used subtropical forest 


54 Wildnerness, 
no trees 


~~ — Natural grassland with trees 


Actual biomass stock (kgC m-?) 


Natural grassland, Artificial grassland 


no trees 


Cropland 
Infrastructure 
-5 T T 1 1 
0 5 10 15 20 
Potential biomass stock (kgC m-?) 
b 
Ambiguous Used tropical forest 
(forest management) 
Infrastructure 
Used subtropical, 
temperate and boreal forest 
(forest management) 
Artificial 
grassland 


Natural grassland, 
with and without 
trees (grazing) 


Cropland 


| Land-cover conversion i Land management (forest management and grazing) 


Figure 2 | Contribution of land-use types to the difference between 
potential and actual biomass stocks. a, Potential and actual biomass stock 
per unit area per land-use type for the assessment based on FRA (dark 
colours) and ref. 16 (light colours). Circle size is proportional to the global 
extent of the individual land use type. The diagonal line indicates the 1:1 
relationship between actual and potential biomass stocks (no change, 
green colour). b, Relative contribution of land-cover conversion and 

land management to the difference between potential and actual biomass 
stocks, calculated on the basis of the assessments based on FRA and 

ref. 16. ‘Ambiguous’ denotes cases attributed differently in the two 
assessments (for absolute values refer to Extended Data Table 1). 


The approaches based on FRA and ref. 16 enable the separation of 
effects of land-cover conversion and land management (Fig. 1c). Owing 
to land-cover conversion (Methods), actual biomass stocks reach only 
10% of potential biomass stocks per unit area (Fig. 2a), affecting only a 
relatively small area of 28 million km’. By contrast, in an area of 56 million 
km? of managed, but not converted, ecosystems, the actual biomass 
stocks reach 60 to 69% of the potential biomass stock per unit area. As 
a consequence, land-cover conversion (53-58%) and land management 
(42-47%) contribute almost equally to the overall difference between 
potential and actual biomass stocks. Forest management contributes two- 
thirds and grazing one-third to the management-induced difference in 
biomass stocks (Fig. 2b and Extended Data Table 1). 

The large impact of land management on vegetation biomass sug- 
gests that estimates of historical land-use change emissions are incom- 
plete if only deforestation is considered (Extended Data Table 2). 
Contextualizing our results with accounts of the global terrestrial 
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Figure 3 | Uncertainty of biomass stock estimates. a, Latitudinal profile 
of all seven actual (yellow) and all six potential (blue) biomass stock 
estimates, the lines indicate the respective median, shaded areas the 
range. b, Ranges of potential and actual biomass stocks per land-use type, 
intersected at the median (n =6 for potential, n =7 for actual biomass 
stocks). In the absence of consistent land-use information for all layers, 
biomass stock changes were estimated on grid cells dominated (>85%) by 
a land-use type and therefore deviate slightly from estimates displayed in 
Fig. 2. The diagonal line indicates the 1:1 relationship where actual and 
potential biomass stocks are equal. c, Detection limit of annual changes 
in actual biomass stocks. Changes in biomass stocks need to exceed the 
detection limit in order to be detectable, for example, in monitoring or 
stocktaking efforts such as foreseen in the Paris Agreement. 


carbon balance suggests that pre-industrial land-use impacts on bio- 
mass stocks were considerable (115-425 PgC of the total difference 
of 375-525 PgC; Extended Data Table 3), corroborating model-based 
findings”; these larger pre-industrial emissions are consistent with 
recent estimates of the global carbon budget considering strong but 
uncertain processes of natural sinks, such as the build-up of peat (see 
Supplementary Information). 

Alternatively—or in addition—they indicate an underestimation 
of the strength of the current terrestrial carbon sink, as suggested by 
model-based studies’**. In order to reduce the large uncertainty range 
of current estimates, future research will need to scrutinize the role of 
land management, in particular in non-forest ecosystems, which are 
often ignored in global carbon studies. It is important to note that the 
difference between potential and actual biomass stocks represents only 
a rough proxy for cumulative emissions from land use. Firstly, it does 
not include soil carbon and product pools. Including soil carbon would 
probably increase the difference, whereas including products would 
decrease it. There are large uncertainties for these two components, 
but their effects are generally estimated to be small in comparison 
to biomass changes!*”!. Secondly, the difference between actual and 
potential carbon stocks is not identical to stock changes between two 
points in time. Both actual and potential biomass stocks refer to the 
same environmental conditions, therefore, their difference integrates 
two effects: cumulative land-use emissions and land-use induced 
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reductions in carbon sequestration that would result from environmental 
changes (Extended Data Fig. 2 and Supplementary Information). 
Therefore, cumulative emissions are probably smaller than the overall 
impact of land use on biomass stocks, depending on the uncertain?" 
strength of the environmental effect. 

The large importance of land management for terrestrial biomass 
stocks has far-reaching consequences for climate-change mitigation. 
The difference between actual and potential biomass stocks can be 
interpreted as the upper boundary of the carbon-sequestration poten- 
tial of terrestrial vegetation. Long-term changes in growth conditions, 
for example, due to large-scale alterations in hydrological conditions 
or severe soil degradation, could lower this potential. Conversely, 
climate change could increase the future potential biomass stocks 
of ecosystems, but this effect is highly uncertain'?”*?. Managing 
vegetation carbon so that it reaches its current potential would store 
the equivalent of 50 years of carbon emissions at the current rate of 9 
PgC per year (PgC yr7'), but that is not feasible, because it would mean 
taking all agricultural land out of production. More plausible poten- 
tials are much lower (Extended Data Table 4); for example, restoring 
used forests to 90% of their potential biomass would absorb fossil-fuel 
emissions for 7-12 years. However, such strategies would entail severe 
reductions in annual wood harvest volumes, because optimizing forest 
harvest reduces forest biomass compared to potential biomass stocks”. 
By contrast, widely supported plans to substantially raise the contri- 
bution of biomass to raw material and energy supply, for example, in 
the context of the so-called bioeconomy”*, imply a need for increased 
harvests”. From the perspective of greenhouse gas emissions, the chal- 
lenge for land managers is to maintain or increase biomass productivity 
while at the same time maintaining or even enhancing biomass stocks. 

Although the uncertainty ranges of actual and potential biomass 
stocks are typically around 35% of the median estimate, the estimates 
rarely overlap across the latitudinal north-south gradient (Fig. 3a). 
Although the potential biomass stock shows a similar uncertainty level 
across most relevant biomes, uncertainty patterns are noteworthy for 
the actual biomass stock. Actual biomass-stock estimates are particu- 
larly uncertain in the tropics (Fig. 3b, c), a region that contains more 
than half of the current global biomass stocks (Fig. 1c). 

The spatial uncertainty patterns are relevant for designing and 
monitoring climate-change mitigation efforts such as carbon-stock 
restoration. Whereas industrialized countries have access to much 
finer and more robust data than those used here, most developing 
countries have to rely on global data, such as those used in this study*"®. 
The uncertainty range could be narrowed if single robust, validated 
method would be applied continuously in the stocktaking efforts. 
Indeed, technical facilities for deriving improved estimates of actual 
biomass stocks will soon become available (for example, the Biomass 
mission of the European Space Agency”®, the Global Ecosystem 
Dynamics Investigation mission of the National Aeronautics and Space 
Administration”’ as well as integration efforts (http://globbiomass. 
org/)). The current planning, however, suggests that this capacity will 
not be fully operational before the inception of the stocktaking pro- 
cesses, and until then, restoration planning and monitoring will have 
to rely on existing global datasets and their present-day uncertainties. 

In boreal and temperate forests, restoration efforts would be 
detectable even with the present-day uncertainties (Fig. 3c). But three- 
quarters of the global restoration potential can be found in tropical 
regions (Fig. 1c and Extended Data Table 4), where biomass stocks 
would need to increase by over 750 gC m~ yr“ for 10 consecutive 
years to be detectable against variation between global data. A large 
threat to biomass-stock conservation comes from the use of dry trop- 
ical forests and savannahs, in particular in Africa, where these biomes 
have been identified as having a high potential for increasing global 
agricultural production, to improve global food security or bioenergy 
supply”®. Given current detection limits for tropical biomes, both the 
intensification of land use in dry tropical forests and savannahs and the 
restoration efforts in tropical forests are questionable because of the 
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possibility of undetectable carbon debts from land-use intensification” 
or unverifiable gains from carbon restoration measures. 

Our analysis suggests that land-use impacts were pronounced 
already in the pre-industrial period and reveals that effects of forest 
management and grazing on vegetation biomass are comparable in 
magnitude to the effects of deforestation. Therefore, a focus on biomass 
stocks helps to recognize options for land-based greenhouse gas miti- 
gation beyond the mere conservation of forest area. Our findings also 
suggest that important trade-offs in climate-change mitigation need 
to be tackled. The scientific and political focus on forest protection 
and productivity increases needs to be complemented by analyses of 
the interactions between land use and the carbon state of ecosystems. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 


We established six datasets for potential biomass stocks and seven datasets for 
actual biomass stocks. All maps were constructed at the spatial resolution of five 
arc minutes. Datasets were chosen on the basis of their coverage (that is, only maps 
covering large parts of the globe were included) and their plausibility. Given that 
most datasets did not cover all land-use types, all regions of the globe, or all relevant 
biomass stocks, some completion exercises were performed to generate consistently 
comparable datasets. These relied on different types of evidence, such as land-use 
information, information from census statistics, remotely-sensed information, and 
modifications of assumptions on biomass-stock density of different land-use cate- 
gories and ecozones. The construction of the individual maps is described below. 
Actual biomass-stock maps 1 and 2. Actual biomass-stock maps 1 and 2 (based on 
FRA and ref. 16, respectively; see Extended Data Fig. 3a, b) enabled the isolation 
of the effect of individual land uses. They were based on a consistent land-use 
dataset, derived and modified from previous work”. The dataset was adjusted to 
newly available statistical data on the national extent of forests!> and cropland*". 
Information on cropland types*” was used to identify permanent crops, other trees 
within cropland® are not included in the cropland layer, complying with FAO 
definitions*!. Unused land was identified on the basis of previous assessments 
(for example, delineating unproductive land with a productivity threshold of 
20 gC m * yr~!)!9°, information on permanent snow from a land cover product*4, 
a thematic footprint map*? and a map on intact forests**. All land not classified as 
infrastructure, cropland or forestry was defined as grazing land. Grazing land was 
split into three layers: (1) Artificial grasslands, that is, grasslands on potentially 
forested areas; (2) natural grasslands with trees, including savannahs and other 
wooded land; and (3) natural grasslands without trees (for example, temperate 
steppes), on the basis of land cover information on the extent of land under agri- 
cultural management™, biome maps*”-*’and MODIS data“ on fractional tree 
cover, applying a tree cover of 5% at the resolution of 500 m to discern grazing 
land with and without trees, in fractional cover representation. The final land-use 
dataset discerns the following classes. Unused land: (1) non-productive and snow; 
(2) wilderness, no trees; (3) unused forests. Used land: (4) infrastructure; 
(5) cropland; (6) used forests; (7) artificial grassland; (8) natural grassland, no trees; 
(9) natural grassland with trees. 

To each land-use unit, typical biomass-stock density values from the literature or 
census statistics were assigned. For forests, the FRA-based map uses national-level 
data from the global Forest Resource Assessment’’. By contrast, the map based on 
ref. 16 uses data from forest inventories and site data. The estimate from ref. 16 is 
higher, particularly in the tropical forests, but slightly lower in boreal forest biomass 
stocks, resulting in overall higher total forest biomass stocks (361 PgC in contrast 
to 298 PgC, for forests only). National forest biomass stock data were downscaled 
to the grid using information on tree height from a global database"’, following the 
finding that tree height is among the critical factors determining biomass stocks 
and it can thus serve as proxy for the spatial allocation of biomass stock densities at 
large scales'*“?. Minimum biomass-stock density for forests was set to 3 kgC m~? 
to discern forests from scrub vegetation and other wooded land. For grassland-tree 
mosaics, no census data on biomass stocks is available. For some countries, data 
on wood stocking (in m?) of other wooded land is available’’, showing a range 
between 0.4% and 21% (inner 50% quartiles) of forest biomass stocks per unit 
area, with outliers of >90%. World region aggregates of biomass-stock densities 
on other wooded land range between 15% and 28% of the values for forests, with a 
world average of 23%. In order to consider non-woody components, which are of 
larger importance for other wooded land compared to forests, as well as to produce 
a conservative estimate, we assumed that biomass stocks per unit area on other 
wooded land were 50% of the corresponding values for forests at the national level. 
For herbaceous vegetation units (artificial grassland on potential forest sites, crop- 
land and natural grassland without trees), we assumed that biomass stocks were 
equal to the annual amount of net primary production!*. For permanent cropland, 
we added 3kgC m * for tree-bearing systems and 1.5kgC m ® for shrub-bearing 
systems to account for woody above- and belowground compartments, in line 
with estimates in the literature (see Supplementary Table 3). In the absence of data, 
and owing to the small extent of this land-use type, biomass stocks on infrastruc- 
ture areas were calculated as one sixth of potential biomass stocks. This assumes 
one-third of infrastructure to be covered by 50% vegetation with trees and 50% 
artificial grassland (the latter was assigned no additional biomass, as the potential 
biomass stocks already provide a progressive estimate). Effects of land degradation 
on natural grassland (with and without trees) were modelled on the basis of losses 
in net primary productivity derived from ref. 43. 

Actual biomass stock maps 3 and 4. Actual biomass stock maps 3 and 4 were 
based on refs 6 and 7, respectively, in combination with ref. 8; see Extended Data 
Fig. 3c, d. Two remote-sensing-based maps were created by combining 
independent remote-sensing products for tree vegetation (including foliage) and 
expanding them to account for belowground and herbaceous compartments where 
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necessary. At the global scale, five distinct regions can be discerned with regards to 
the availability of global remote-sensing-based products. For the northern boreal 
and temperate forests one product is available*. A large part of the tropical zone 
is covered by two datasets®’. These two datasets show pronounced differences, 
among each other as well as in comparison with in situ data”!°. A smaller fraction 
of the tropical zone, including a large part of Australia, South America and South 
Africa is covered by only one of the remote-sensing datasets®, whereas a region in 
China is covered by two datasets®*. For some regions (the southernmost part of 
Australia, parts of Oceania), no remote-sensing data are available. In these regions, 
map 1 was used in the compilation of map 3 and 4. Map 3 was constructed by 
complementing forest biomass stock data for the temperate and boreal zones* with 
data on net primary productivity" in order to account for herbaceous vegetation, 
applying a forest-non-forest mask derived from the GLC2000 land cover map™. 
The resulting map for the northern forests was combined with the biomass stock 
map for the tropical zone®. The latter was also extended with data on net primary 
productivity!’ to account for the herbaceous fractions. For map 4, we replaced 
values for woody vegetation from map 3 with data from ref. 7, where available. 
Actual biomass stock maps 5 and 6. Grid-cell-based minima and maxima of the 
remote-sensing maps; see Extended Data Fig. 3e, f. While maps 3 and 4 serve as 
a best-guess available from remote-sensing products, these two maps were based 
on a statistical approach, calculating the grid-cell-based minima and maxima of 
various remote-sensing input data, enabling an assessment of the absolute upper 
and lower boundaries, breaking up the auto-correlated nature of remote-sens- 
ing-derived maps. Maps 3 and 4 were used as input. Furthermore, a modulation 
was calculated for the area covered only by the map of ref. 8. This map uses a forest 
mask derived from GLC2000™. In order to reflect the uncertainty of this land cover 
map, we used an alternative forest mask to calculate new values at the grid level. We 
projected the grid-based biomass stock density (biomass per unit area) values from 
ref. 8 to the MODIS fractional tree cover dataset”. Additionally, alternative maps 
for net primary productivity were used to complement these biomass stock maps 
for woody vegetation, derived by a vegetation model*, a numerical model*® and 
from remote-sensing estimates*’. Map 5 was calculated as the cell-based minima, 
map 6 as the cell-based maxima of these input layers. 

Actual biomass stock map 7. A seventh map was taken from the literature“*; see 
Extended Data Fig. 3g. 

No robust empirical information is available that would allow resolution of the 
discrepancies between the two datasets on the basis of consistent, spatially explicit 
land-use information (maps 1 and 2). The difference between these two estimates 
was 79 PgC. Both assessments are inventory-based, but in ref. 16 long-term 
measurements of network plots for the tropical regions were used to compensate for 
data gaps, whereas FRA reports national data that are often based on remote sensing. 
The contribution of global remote-sensing data (benchmark maps) to resolve this 
discrepancy is still limited. The two available high-resolution datasets covering the 
tropics®’ show pronounced differences, between each other and in comparison 
with in situ data?!°, The estimate from ref. 16 is situated between these two 
estimates, whereas the estimate from the FRA is situated below the minimum. 
However, a study based on alternative site data!! corrected both maps downwards, 
close to the grid-based minimum of both accounts, better matching the FRA-based 
assessment. 

Potential biomass stock maps. Potential vegetation refers to a hypothetical state of 
vegetation, which would prevail without human activities but under current climate 
conditions*’. We compiled five maps following an ecozone approach, allocating 
typical carbon densities of zonal vegetation to state-of-the-art ecozone maps for 
current climate conditions*”~», with current coastlines and current permanent ice 
cover. The carbon-density values refer to landscape-level averages and take effects 
of age distribution and natural disturbance into account. We used high-resolution 
data from the ESA GlobCover 2009 Project* to exclude small water bodies and 
small-scale bare areas, with the exception of ecosystems where carbon-stock values 
already take bare areas into account, for example, steppes and thorn savannahs. 
Small-scale variability caused by, for example, the spatial variability of edaphic 
conditions or water availability (azonal vegetation) was neglected. No information 
is available that allows us to determine whether this omission, or sampling biases 
in the input data, introduces an upward or downward bias in the maps. Input data 
could be biased towards high values if sampling favoured undisturbed, old-grown 
stands, or towards lower values, if the data were derived from human-disturbed 
vegetation in the absence of natural vegetation remnants for certain ecosystem 
types. The comparison with other estimates shows that our data are well in line with 
the literature (Extended Data Fig. 1) and suggest that such biases have a minor role. 
Furthermore, approximations of upper and lower estimates for potential vegetation 
were calculated to determine realistic ranges of global biomass stocks. 

Potential biomass stock maps 1 and 2. IPCC-based maps, FRA-adjusted or 
adjusted to ref. 16; see Extended Data Fig. 4a, b. Two maps were constructed 
to consistently match the actual biomass stock maps 1 and 2. They build from 
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best-available estimates on potential, landscape-averaged biomass-stock densities 
for zonal vegetation, mainly from IPCC values*!, with the exception of boreal 
forests. For boreal forests, owing to large uncertainties?*”>, the maximum values 
of biome-wide actual biomass stocks per unit area between 1990 and 2007!° were 
used to derive a conservative estimate. Map 1 was subsequently adjusted at the 
grid level so that potential biomass stock values below actual biomass stock levels 
matched the actual biomass stocks in the FRA-based map. For map 2, this adjust- 
ment was done with the map based on ref. 16. 

Potential biomass stock maps 3 and 4. Maps 3 and 4 were based on classic 
ecological data: cell-based minima and maxima; see Extended Data Fig. 4c, d. 
Two further maps were calculated by using biomass stock density values?3*"4 for 
natural, zonal vegetation, from synthesis efforts of site-specific data, for example, 
from the International Biological Programme*. Similar to maps 1 and 2, these 
values were allocated to the three biome maps*”~*”, and the cell-based minima 
(map 3) and maxima (map 4) of all three maps were calculated. 

Potential biomass stock map 5. A remote-sensing-based map; see Extended Data 
Fig. 4e. A fifth map was derived from the remote-sensing maps 3 and 4 on actual 
biomass stocks. For all 1,303 ecozones that result from the intersection of the three 
biomes maps*”~*? mentioned above (see Extended Data Fig. 5e), the 95 percen- 
tile biomass stock values of all 30 arc second grid cells (1 x 1 km at the equator) 
within one ecozone, excluding agricultural lands, derived from the GLC2000*4, 
was calculated. For ecozones covered by more than one remote-sensing map, we 
used the arithmetic mean. This approximation builds on the assumption that in 
each ecozone, areas of natural vegetation units remain that are representative for 
the potential biomass-stock densities of the respective ecozone and that the values 
take natural disturbance into account (owing to the grain size of the input maps 
and selection procedure). This is confirmed by a cross-check that revealed that 
the 95 percentile is on average 51% lower than the maximum values found in each 
ecozone. Using maximum values, the global biomass would be 1.56 times larger 
than the one estimated here. An upper bias in this map could emerge from 
the neglect of naturally unfavourable sites within an ecozone (owing to, for 
example, low water availability or soil fertility); a lower bias could emerge if in an 
ecozone only disturbed vegetation units prevail, or most of the favourable sites 
are converted. 

Potential biomass stock map 6. An independent sixth map was taken from the 
literature°°; see Extended Data Fig. 4f. 

Calculation of the land-use-induced difference in potential-actual biomass 
stocks. In order to assess the range of the effect of land use on biomass stocks, 42 
potential-actual biomass-stock difference maps were calculated by combining 
the seven actual biomass-stock maps with the six potential biomass-stock maps. 
In all cases, we adjusted the maps where necessary, so that the actual biomass 
stocks would not surpass the potential biomass stocks. Increases in actual over- 
potential biomass stocks could be caused, for instance, by fire prevention. However, 
the magnitude of this effect is highly uncertain at larger spatial scales, because 
fire prevention often leads to less frequent, but more damaging fires with larger 
biomass loads that could compensate for carbon gains°”** on longer time scales. 
On unused land (for example, wilderness), no land-use induced biomass-stock 
reduction was assumed. Unproductive and water areas were excluded from the 
assessment. Differences in the spatial thematic resolution of potential and actual 
biomass-stock maps warrant a caveat when interpreting the fine-scale results of 
the biomass-stock difference. 

Attribution to land management and land-cover conversions. For two of the 
actual biomass stock maps, we could isolate and quantify the impact of individual 
land-use types, that is, the maps based on consistent, detailed land-use information 
(actual biomass stock maps 1 and 2). From these maps, land-cover conversion 
impacts were calculated as the sum of potential-actual biomass-stock differences 
due to cropland, artificial grassland (that is, grassland on potential forest sites) 
and infrastructure. The biomass-stock differences of all other land-use types were 
accounted for as the impact of land management (Extended Data Fig. 2). Forest 
management was considered to dominate land-management effects in forests, 
and land-management practices on other used lands were considered as grazing. 
This approach represents a proxy only. A sharp and unambiguous separation 
between land-cover conversion and land management would require information 
on past land uses, which currently is not available, as well as arbitrary decisions 
on thresholds of change. Examples to illustrate these intricacies are: the biomass 
stock change on a parcel of land that was cleared from pristine forests to crop- 
land in the past and, after cropland abandonment, is used as forest plantation, 
would be accounted for as land management, while it would—at least to a certain 
degree—also represent land-cover conversion if historic uses were to be considered. 
Similarly, if a forest clear-cut area is used for grazing during the re-growth phase, 
the biomass-stock difference would be attributed to land-cover conversion, 
whereas it might also represent land management. If, due to land use, a forest is 
changed in terms of its species composition, crown closure, stem height and so 


on, but still remains within key forest parameters (for example, >10% tree cover, 
stem height >5 m), it is eventually an arbitrary decision whether this change is 
a land-cover conversion or land management. Additionally, the effects of forest 
management versus grazing cannot fully be disentangled, because of practices, such 
as forest grazing and wood extraction for fuel in natural grasslands. Given these 
practical and theoretical ambiguities, we argue that the simple allocation scheme 
adopted here is a useful proxy based on transparent considerations, making best 
use of the available datasets. For preparation of Figs 1c and 2b, we calculated the 
contributions of land management and conversions separately for the maps based 
on the data from FRA and ref. 16. The minima of the contribution of each land-use 
type were used for the attribution. The difference in the sum of all minima to 100% 
was labelled as ‘ambiguous, as it is attributed to land management in the map based 
on FRA! and land-cover conversion in the map based on ref. 16, or vice-versa (see 
Extended Data Table 1). 

Calculation of the detection limits on the basis of the actual biomass-stock 
maps. The spatially explicit detection limit for stock changes in actual biomass 
was estimated from the variation between the seven actual biomass estimates. 
This assumes that the uncertainty is driven by differences in approaches rather 
than measurement errors within a single approach and that the seven estimates 
of the actual biomass stocks are equally likely and, therefore, the main source of 
uncertainty. For each grid cell we mimicked a stocktaking at present (t) and after 
10 years (t + 10) by randomly selecting two biomass stocks from the uncertainty 
between approaches for that cell. Subsequently, the detected annual change in 
biomass stock was calculated. A distribution of 1,000 detected annual changes was 
obtained through resampling. Given that the annual changes were calculated by 
sampling the same distribution at t and t + 10, there were no underlying changes 
in biomass stock. The inner 95% of the detected stock changes within each grid 
cell were assumed to be insignificant. The 5% stock changes that were found to be 
significant despite the biomass stock being constant between ¢ and t + 10, were 
used as an estimate for the detection limit in that grid cell. Given present-day 
uncertainties, a real stock change should thus exceed the detection limit to be 
correctly classified as a change. At present, evidence is missing to consider one 
approach as being more precise and accurate than the other approaches”), 
Nevertheless, if future advances would enable selecting a single best approach, the 
uncertainty and detection limit would decrease and in turn enhance the capacity 
for verification of changes in biomass stocks. 

Code availability. Esri ArcGis and MATLAB codes used in the compilation and 
analysis of results are available upon request from the corresponding author. 
Data availability. The data sources for actual and potential biomass-stock esti- 
mates are listed above. Source Data for Figs 1b, c, 2a, b, 3a, b and Extended Data 
Fig. 1 are provided with the online version of the paper. Final results, data and maps 
are available at http://www.uni-klu.ac.at/socec. Underlying data, for example, data 
from other sources, which support findings of this study, are available from the 
corresponding author upon request. 
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Extended Data Figure 1 | Estimates of the potential and actual biomass (1)°, (2)3, (3)79, (4)*4, (5)79, (6)74, (7)72, (8)73, (9)°9, (10)74, (11)”°, (12)78, 
stocks from the literature and this study. a, Potential biomass stocks. (13)77, (14)78, (15)7?, (16)48, (17)®°, (18)®! (19). The darker shaded 
b, Actual biomass stocks. Datasets from the following studies were used: columns are those used in this study (for details see text). 
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Extended Data Figure 2 | Conceptual and methodological design of 
the study. a, The relation of prehistoric (cx), potential (3) and actual 

(4) biomass stocks. Potential vegetation refers to the vegetation that 
would prevail in the absence of land use but with current environmental 
conditions. As both actual and potential vegetation refer to the same 
environmental conditions, their difference must not be interpreted 

as a stock change between two points in time. As a consequence, the 
comparison of potential and actual biomass stocks does not refer to the 
cumulative net balance of all fluxes from and to the biomass compartment 
(for example, induced by land-use and environmental changes). Rather, 
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it isolates and quantifies the effect of land use on biomass stocks. The 
effect of land use consists of two components, that is, cumulative land-use 
emissions and land-use-induced reductions in carbon sequestration that 
would result from environmental changes. For more information and 
discussion, see Supplementary Information. b, Conceptual attribution 

of the difference between potential and actual biomass stocks to land 
conversion and land management. Error bars reflect the divergence among 
datasets for the respective vegetation types and indicate the determination 
of verification volumes. 
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Extended Data Figure 3 | Actual biomass stock maps used in the study. maximum. g, Map from ref. 48. The same mask for unproductive areas 
a, FRA-based map. b-d, Maps based on refs 16 (b), 6 and 8 (c), and 7 and has been applied to all maps. For details and sources of maps in a-f, see 
8 (d). e, Remote-sensing-derived minimum. f, Remote-sensing-derived Methods. 
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study. a, IPCC-based, FRA-adjusted map. b, IPCC-based map adjusted ref. 56. The same mask for unproductive areas has been applied to all 
using data from ref. 16. c, Cell-based minima of classic data. d, Cell-based maps. For details and sources for maps in a-e, see Methods. 
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Extended Data Figure 5 | Land-use-induced difference in potential and 
actual biomass stocks, uncertainty of input data and vegetation units 
used in the study. a, Impact of land-cover conversion. b, Impact of land 


management. a, b, Maps are based on the FRA-based actual biomass-stock 


map and the corresponding, IPCC-based FRA-adjusted potential carbon- 
stock map. c, Standard deviation of potential biomass-stock maps (n = 6). 


d, Standard deviation of actual biomass-stock maps (n = 7). e, Intersect 
of all three*”-> biome maps used in the ecozone approaches and for the 
construction of the remote-sensing-based potential biomass-stock map. 
f, FAO ecozones” used for the aggregation of results. The ‘tropical core’ 
consists of humid rainforests. The tropical zones contain moist deciduous 
forests, dry forests, tropical shrubs, savannahs and hot deserts. 
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Extended Data Table 1 | Biomass stocks per type of land use 


Potential 5 ‘ Contribution to 
Area Biomass Stocks Actual Biomass Stocks Difference differerice 

[Mkm?] [PgC] [kgC m*] [PgC] [kgC m?] [%] [%] 
Total 130.4 876-906 6.7-6.9 407-476 3.1-3.6 48-54% 100% 
Infrastructure 1.4 12 8.6-8.7 1 0.7 92-93% 2-3% 
Cropland 15.2 139-141 9.2-9.3 10 0.6 93% 28-31% 
Grassland and grazing land 54.3 374-379 6.9-7.0 119-121 2.2 69-70% 54-60% 
Forests 40.7 443-460 10.9-11.5 297-368 7.3-9.0 22-33% 23-31% 
Unused non-forest land 26.2 16-17 0.6 16-17 0.6 0% 0% 
Land cover change (LCC) 
Cropland 15.2 139-141 9.2-9.3 10 0.6 93% 28-31% 
Artificial grasslands 11.3 114-116 10.1-10.3 7 0.6 94% 23-25% 
Infrastructure 1.4 12 8.6-8.7 1 0.7 92-93% 2-3% 


Land management (LM): forest management 


Used forests 


tropical 22.3 311-327 14.0-14.7 192-251 8.6-11.3 23-38% 18-25% 

temperate 5.4 51 9.3-9.4 33-35 6.1-6.4 32-34% 4% 

boreal 7.0 40-41 5.7-5.8 30-32 4.2-4.6 21-25% 2% 
Subtotal forest management 34.7 401-419 11.6-12.1 255-318 7.3-9.2 24-36% 23-31% 


Land management (LM): grazing 


Other wooded land, grasslands-tree mosaics 


tropical 14.6 109-110 78 47 3.2 57% 13-15% 

temperate 4.0 11 2.8-2.9 5-6 1.2-1.4 50-58% 1-2% 

boreal 2.9 10 3.4-3.5 5 1.5-1.7 51-56% 1% 
Natural grassland w/o trees 14.2 21 1.5 19 1.3 11-13% 0-1% 
Subtotal grazing land 35.7 151-153 4.2-4.3 75-76 2.1 50-51% 16-18% 


No biomass stock change 


Wilderness, productive, w/o trees 9.7 16-17 1.6-1.7 16-17 1.6-1.7 0% 0% 
Unused forests 6.0 42-50 7.0-8.3 42-50 7.0-8.3 0% 0% 
Unproductive area 16.5 : : : : 0% 0% 
Land cover change (LCC) 27.8 265-269 9.5-9.7 17.1 0.6 94% 53-58% 
Land management (LM 56.2 553-572 7.9-8.1 312-374 4.7-5.6 31-40% 42-47% 


Ranges indicate the difference between the estimates based on FRA and on ref. 16. Mkm2, million km?. 
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Extended Data Table 2 | Compilation of published estimates of emissions associated with anthropogenic land-cover change and land 
management until present (industrial and pre-industrial) 


Reference Land management activities considered Cumulative emissions 
Total cumulative emissions from land use 


DeFries et al., 1999°" - 182-199 
Strassmann et al., 2008 - 233 
Olofsson and Hickler, 2008° Crop harvest 194-262 
Pongratz et al., 2009°° - 171 
Kaplan et al., 20102, Hyde 3.1 based* - 137-189 
Kaplan et al., 20102°, KK10 based* Land-use intensity, shifting cultivation 325-357 
Stocker et al., 20145 Wood and crop harvest, tillage, shifting cultivation 243 
This study, FRA- and Pan-based Top-down, all activities 431-469 
This study, inner quartiles of 42 estimates Top-down, all activities 375-525 


Note that most model-based results include fluxes from soils and wood products. Datasets are from refs 20, 60-64. 
*Pre-industrial emissions only. 
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Extended Data Table 3 | Comparison of the difference between potential and actual biomass stocks to components of the global carbon 
balance, including land-use change (LUC) emissions and net terrestrial biosphere sink 


i) cumulative, before ii) cumulative, since iii) Cumulative This Study 
1800 1800 (i + ii) (potential-actual 
biomass stock 
difference) 
(1) LUC emissions 353 PgC 140 PgC 493 PgC 447 PgC 
(310 -395); calculated (100-180, Sabine et al.®*) (410 - 575) (375 - 525) 
from (2) and (3) [IPCC* : 100-260 PgC) [IPCC4: 410-655 PgC] 
(2) Terrestrial biosphere sink -270 PgC -101 PgC -371 PgC 
(Peat, e.g. Carcaillet et (-61 - -141; Sabine et (-331 - -411) 
al.©, Kleinen et al.®*, Yu et al.68) 
al.§7) 
(3) Net terrestrial balance 83 PgC 39 PgC 122 PgC 
(1)+(2) 40 — 125 (Kaplan et al.?°) (11-67; Sabine et al.®°) (51-192) 


The difference in biomass stock of 447 PgC (375-525) is well in line with estimates of total (before and since 1800) cumulative emissions from LUC. For details and discussion, see Supplementary 
Information. Datasets are from refs 4, 20, 65-68. 
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Extended Data Table 4 | Hypothetical absorption potentials of carbon stock restorations and indicative years until saturation at a current 
emission level of 9 PgC yr? 


Restoration of FRA-based Pan-based Years* 
[PgC] [PgC] lyr] lyr] 
All C-stocks to 100% of potential 469 431 52 48 
Cropland to 100% potential 129 131 14 15 
Artificial pastures to 100% of potential 107 109 12 12 
Cropland & artificial pastures to 30% of potential 60 61 7 7 
Boreal forests to 100% of potential 10 9 1 1 
Temperate forests to 100% of potential 17 16 2 2 
Tropical forests to 100% of potential 119 76 3 8 
All forests to 100% of potential 147 101 11 
Boreal forests to 90% of potential 6 4 0 
Temperate forests to 90% of potential 12 11 1 1 
Tropical forests to 90% of potential 88 44 10 5 
All forests to 90% of potential 106 59 12 7 
Boreal forests to 80% of potential 2 0 0 0 
Temperate forests to 80% of potential iC 6 1 1 
Tropical forests to 80% of potential 57 11 6 1 
All forests to 80% of potential 66 17 7 2 
Other wooded land and savannas to 100% of potential 73 75 8 8 
Other wooded land and savannas to 80% of potential 47 49 5 5 


Note that a restoration to 100% of the potential probably entails a cessation of the respective land use, due to the intrinsic relations of harvest and carbon stocks9. 


*Years until saturation at current carbon emissions of 9 PgC yr~!. 
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Sooty mangabey genome sequence provides insight 
into AIDS resistance in a natural SIV host 
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In contrast to infections with human immunodeficiency virus (HIV) 
in humans and simian immunodeficiency virus (SIV) in macaques, 
SIV infection of a natural host, sooty mangabeys (Cercocebus atys), 
is non-pathogenic despite high viraemia'. Here we sequenced and 
assembled the genome of a captive sooty mangabey. We conducted 
genome-wide comparative analyses of transcript assemblies from 
C. atys and AIDS-susceptible species, such as humans and macaques, 
to identify candidates for host genetic factors that influence 
susceptibility. We identified several immune-related genes in 
the genome of C. atys that show substantial sequence divergence 
from macaques or humans. One of these sequence divergences, 
a C-terminal frameshift in the toll-like receptor-4 (TLR4) gene 
of C. atys, is associated with a blunted in vitro response to TLR-4 
ligands. In addition, we found a major structural change in exons 
3-4 of the immune-regulatory protein intercellular adhesion 
molecule 2 (ICAM-2); expression of this variant leads to reduced 
cell surface expression of ICAM-2. These data provide a resource for 
comparative genomic studies of HIV and/or SIV pathogenesis and 
may help to elucidate the mechanisms by which SIV-infected sooty 
mangabeys avoid AIDS. 

SIV infection of natural hosts, such as sooty mangabeys, is typically 
non-pathogenic despite high viraemia. This is in stark contrast to 
HIV infection in humans and experimental SIV infection in rhesus 
macaques (Macaca mulatta) that progress to AIDS unless treated with 
antiretroviral therapy. The main virological and immunological fea- 
tures of natural SIV infection in sooty mangabeys have been described 
over the past 15 years in studies that compared and contrasted this 
infection with the pathogenic infections of HIV and SIV in humans 
and rhesus macaques. SIV-infected sooty mangabeys show several fea- 
tures that have been observed in pathogenic infections, including high 
viraemia, short in vivo lifespan of productively infected cells, depletion 
of mucosal CD4* T cells, strong type-I interferon response in the acute 
infection, and cellular immune responses that fail to control virus rep- 
lication. However, in contrast to pathogenic infections, SIV-infected 
sooty mangabeys (i) have healthy CD4* T cell levels; (ii) do not expe- 
rience mucosal immune dysfunction, avoiding depletion of T helper 17 
(Ty17) cells, intestinal epithelial damage and microbial translocation; 
(iii) maintain low levels of immune activation during the chronic infec- 
tion; and (iv) achieve compartmentalization of virus replication that 
preserves central-memory and stem-cell memory CD4* T cells as well 
as follicular Ty cells!”. An additional notable feature of SIV infection 


in natural hosts is the low rate of mother-to-infant transmission that is 
related to low expression of CCR5 on circulating and mucosal CD4* T 
cells’, Although many aspects of the natural course of SIV infection in 
sooty mangabeys have now been described, the key molecular mecha- 
nisms by which these animals avoid AIDS remain poorly understood. 

In this study, we sequenced the genome of a captive sooty mangabey 
and compared this genome to the genomes of AIDS-susceptible pri- 
mates to look for candidate genes that may influence susceptibility 
to AIDS in SIV-infected hosts. We sequenced genomic DNA to a 
whole-genome coverage of about 180 using the Illumina HiSeq 
2000 platform, and produced an initial assembly using ALLPATHS-LG, 
Atlas-Link and Atlas-GapFill (see Methods for details). The total size 
of the assembled C. atys genome (Caty_1.0; NCBI accession num- 
ber GCA_000955945.1) is around 2.85 Gb, with a contig N50 size of 
112.9kb and scaffold N50 size of 12.85 Mb (Table 1). Genome anno- 
tation identified 20,829 protein-coding genes and 4,464 non-coding 
genes in the C. atys assembly, which is comparable to other available 
draft quality genomes of nonhuman primates (Table 1). These anal- 
yses demonstrate that the Caty_1.0 reference genome is of sufficient 
quality to facilitate population-scale whole-genome and transcriptome 
sequencing studies. 

To identify novel immunogenetic factors specific to C. atys that may 
be involved in the ability of this species to avoid progression to AIDS, 
we established a bioinformatic pipeline for a comparative protein analy- 
sis (Fig. 1 and Extended Data Fig. 1, see Methods for details). Using this 
approach, we found 34 candidate immune-related genes with sequences 
that diverged between C. atys and M. mulatta (Table 1 and Extended 
Data Table 1). Although we cannot exclude a role of immune genes 
with minor differences in C. atys and M. mulatta, the highly divergent 
genes listed in Table 1 and Extended Data Table 1 constitute candidate 
genes involved in the outcomes of SIV infection in these two species. 

Our screen identified sequence divergence in a number of proteins 
that are important during HIV infection, such as APOBEC3C (91.6%) 
and BST2 (also known as tetherin, 95.1%), as well as pattern-recognition 
receptors (MBL2, CLEC4A, CLEC4D and CLEC6A), the antiviral 
sensor cyclic GMP-AMP synthase (cGAS (also known as MB21D1)) 
and other immune mediators (Extended Data Table 1). Because 
CD4 and CCR5 are important for AIDS pathogenesis, we aligned 
the sequences of CaCD4 and CaCCR5 to MmCD4 and MmCCRS5, 
respectively*®. Neither gene showed any major structural changes 
in the wild-type variants, although CD4 was slightly below the 97% 
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Table 1 | C. atys assembly statistics and proteins with major structural variations in the C. atys genome 


Assembly Annotation 

Average coverage per base 192 Protein-coding genes 20,829 
Total sequence length 2,848,246,356 bp Non-coding genes 4,464 
Total assembly gap length 60,973,502 bp Pseudogenes 5,263 
Number of scaffolds 11,433 mRNA transcripts 65,920 
Scaffold N50 12,849,131 bp IncRNA transcripts 6,299 
Scaffold L50 66 Exons in coding transcripts 250,660 
Number of contigs 76,752 Exons in non-coding transcripts 42,280 
Contig N50 112,942 bp 

Contig L50 6,930 

GC content 40.90% 

Gene Function Variation type Length variation (amino acids) 
ICAM2 Lymphocyte extravasation and recirculation indel, fs 107 
TLR4 LPS sensing indel, fs 17 
BPIFA1 Antimicrobial function in airways indel 8 
NOS2 Proinflammatory messenger pm, early stop 8 
MBL2 Pattern recognition receptor for microbial products pm, early start fi 
TREM2 Chronic proinflammatory signalling in myeloid cells indel, fs 6 
PLSCR1 Enhancement of the interferon response indel 5 

LSP Inhibition of lymphocyte proliferation indel, fs 5 
CRTAM T and natural killer cell activation pm, indel 4 


Structural variations were identified by the immunogenomic comparison pipeline. N50, 50% of the genome is in fragments of this length or longer; L50, smallest number of fragments needed to cover 
more than 50% of the genome; IncRNA, long non-coding RNA; indel, insertion/deletion; fs, frameshift; pm, point mutation. 


threshold of identity (Extended Data Fig. 1b, c). In addition, we found 
specific gene families in C. atys that are expanded relative to M. mulatta, 
humans and other primates (Extended Data Table 2a). Notably, we 
detected localized regions of increased substitution, defined by a clus- 
tered difference of three or more amino acids, in 10 genes. The most 
marked variations in the amino acid sequence of C. atys compared to 
M. mulatta were observed in ICAM-2 and TLR-4 (Table 1). 

ICAM-2 is an approximately 60-kDa transmembrane glycoprotein 
of the immunoglobulin superfamily, which is expressed on various 
immune cells and implicated in lymphocyte homing and recircula- 
tion’. ICAM-2 ligands are lymphocyte function-associated antigen-1 
and the C-type lectin DC-SIGN’. We discovered a misalignment of 
the ICAM-2 proteins between C. atys and M. mulatta that starts in 
exon 3 (Extended Data Fig. 2a). This difference is explained by a 499- 
bp deletion starting from exon 3 of CaICAM2, as detected by PCR 
and Sanger sequencing (Fig. 2a and Extended Data Fig. 3). We subse- 
quently confirmed the expression of this truncated form of ICAM-2 
in ten out of ten additional C. atys genome sequences (Extended Data 
Fig. 2b). By contrast, analysis of the whole-genome sequences of 15 
baboons and more than 130 rhesus macaques demonstrated that only 
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Figure 1 | Bioinformatic pipeline for the identification of divergent 

C. atys proteins. (1) Sooty mangabey (SM) orthologues were selected by 
BLAST alignment of C. atys NCBI protein predictions (blue) to curated 
rhesus macaque (RM) protein models (green”) and alignment scores 
were calculated. (2) NCBI transcript predictions with RNA-seq support 
were identified by BLAT alignment of de novo assembled C. atys RNA-seq 
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the full-length ICAM-2 protein was found in all individuals (data not 
shown)*. The ICAM-2 deletion may be specific to C. atys, as it is not 
present in any other known primate sequences, including other nat- 
ural SIV hosts, such as the African green monkey, drill and colobus 
monkey. Transcript models generated from de novo assembled C. atys 
RNA-sequencing (RNA-seq) data from 14 different tissues showed that 
the mature mRNA sequence of Ca[CAM2 retains substantial portions 
of what is part of the intronic sequence in other nonhuman primates, 
and thus codes for a markedly different final gene product (Extended 
Data Figs 2, 3). Splice-junction sequence analysis showed intact splicing 
for all four exons in M. mulatta, but no splice junctions were found 
between exons 3 and 4 in C. atys, indicating severe splicing defects due 
to the deletion (Extended Data Fig. 4). 

To test whether the observed genetic difference in ICAM2 has 
functional consequences, we measured ICAM-2 surface expression 
on immune cells from humans, M. mulatta and C. atys with an anti- 
body that recognizes a conserved epitope between these species’. 
ICAM-2 was readily detected on T cells and B cells from humans and 
M. mulatta, but not from C. atys (Fig. 2b, c), suggesting that ICAM-2 is 
not functional in lymphocytes of C. atys. However, a truncated, lower 


SM NCBI protein predictions 


with RNA-seq support 


° @+@S%e 


(8,902 genes) 


transcripts (orange) to C. atys NCBI coding sequence (CDS) predictions 
(red). (3) Subsquently, corresponding RNA-seq-supported C. atys NCBI 
protein predictions were selected. (4) C. atys proteins with high similarity 
(>97% identity) to M. mulatta proteins were filtered out. (5) Immune genes 
according to Gene Ontology (GO) term classification (immune response) 
were chosen for further analysis and (6) confirmed by manual inspection. 
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Figure 2 | Genomic deletion in CaICAM2 results in a truncated and 
dysfunctional protein. a, PCR to confirm a putative 0.5-kb deletion in the 
CaICAM2. b, ICAM-2 surface expression of primary CD4* cells by flow 
cytometry. n= 3; representative plots for c. c, ICAM-2 surface expression 
in B cells, CD4* and CD8* T cells from human, rhesus macaques and 


molecular weight form of ICAM-2 could be detected intracellularly by 
western blot in C. atys cells (Fig. 2d), thus demonstrating the presence 
of the predicted truncated ICAM-2 protein. Overall, these data indi- 
cate that the presence of a species-specific gene sequence difference 
in CaI[CAM2 results in the abrogation of surface expression of this 
protein in C. atys. Further studies are needed to elucidate potential links 
between this truncated form of ICAM-2 and the remarkable immuno- 
logical features of SIV infection in this species. 

TLR-4 is a pattern recognition receptor that senses lipopolysaccha- 
rides (LPS) on gram-negative bacteria and initiates pro-inflammatory 
cytokine induction, maturation and activation in macrophages, 
dendritic cells and other immune cells. During pathogenic HIV or 
SIV infections, exacerbated TLR-4 stimulation and concomitant pro- 
inflammatory signalling elicited by microbial translocation is con- 
sidered a primary mechanism that underlies HIV-induced chronic 
immune activation!®!!. Here, we found that the TLR-4 protein 
sequences of M. mulatta and C. atys are markedly different at the 
C terminus (Extended Data Fig. 5a). We confirmed the underlying 
difference in the TLR4 nucleotide sequence by Sanger sequencing 
(Extended Data Fig. 5b, c). We next analysed the genomic DNA sequence 
of TLR4 in 10 additional sooty mangabeys and found that the observed 
DNA sequence difference was present in all individuals (Extended Data 
Fig. 6a). Alignment of TLR-4 protein sequences from different primate 
species revealed that the 17-amino-acid longer C-terminal sequence 
is only found in natural SIV hosts, such as African green monkey, drill 
and colobus monkey (Fig. 3a), whereas non-natural hosts, including 
M. mulatta and baboons show expression of the short TLR-4 
C-terminal sequence. 

The divergence of TLR-4 amino acid sequences amongst Old World 
primates shows an interesting pattern of molecular evolution. First, 
the genomic sequence encoding the TLR4 C terminus is defined by 
a 1-bp deletion causing a frame shift in all Old World monkeys, both 
natural and non-natural hosts, including colobine and cercopithecine 
lineages, but it is not found in either hominoids (apes and humans) 
or platyrrhines (New World monkeys) (Extended Data Fig. 6b). This 
suggests that this mutation occurred after the hominoid-Old World 
monkey divergence approximately 25 million years ago'”. Second, there 


Rhesus macaque Sooty mangabey 


sooty mangabeys. n = 3 biologically independent samples for each species. 
d, ICAM-2-specific western blot using peripheral blood mononuclear 
cells from M. mulatta and C. atys.n=3 M. mulatta; n=2 C. atys; one 
representative biological sample per species is shown. For gel source data, 
see Supplementary Figs 1, 3. 


is a G-to-A nucleotide substitution in the non-natural host Old World 
monkeys (baboons and macaques) that creates a truncated protein in 
these species® (Extended Data Fig. 6b). Although a naive analysis of 
this pattern would suggest two independent mutational changes in 
TLR4, the short internal branch of the species tree implies that incom- 
plete lineage sorting of an ancestral polymorphism could also generate 
this pattern’? (Fig. 3b). To test this hypothesis, we examined the TLR4 
gene tree among 17 primate species. While generally supporting the 
relationships among these species (Fig. 3b), the analysis also found 
a number of nucleotide positions—spaced throughout the gene— 
consistent with incomplete lineage sorting between C. atys, baboon 
and M. mulatta (Extended Data Fig. 7). The incomplete lineage sorting 
hypothesis is also more likely, given that balancing selection is often 
found to be acting on immune-related genes. Therefore, even though 
baboons are believed to be more closely related to sooty mangabeys and 
drills than to rhesus macaques, the phylogeny of Old World monkeys 
is compatible with the possibility of a single G-to-A mutation creating 
the truncated form of the protein in the common ancestor of baboons, 
rhesus macaques and sooty mangabeys'”"* (Fig. 3b). 

We next investigated potential differences in TLR-4 function between 
M. mulatta and C. atys. Our previous work has shown that mac- 
rophages from C. atys exhibit higher expression of tetherin, APOBEC 
and TRIM5qa in response to LPS compared to M. mulatta’. This is 
consistent with the relative resistance of C. atys macrophages to in vivo 
SIV infection after experimental CD4* T cell depletion compared to 
SIV-infected M. mulatta macrophages'®. Here we analysed cytokine 
gene expression and protein production after LPS stimulation, and 
found reduced mRNA expression and secretion of TNF (also known 
as TNF-a) and IL-6 in cells from C. atys compared to M. mulatta 
(Fig. 3c, d). Because some commercial LPS preparations contain 
lipoprotein contaminants that can induce TLR-2 signalling, we con- 
firmed the TLR-4 specificity of the reduced LPS response using the 
selective TLR-4 agonist!” lipid-A (Extended Data Fig. 8a, b). Next, 
we found that the species-specific differences between C. atys and M. 
mulatta in LPS-induced TNF and IL-6 production were maintained 
in acute and chronic infection (Fig. 3e and Extended Data Fig. 8c). 
Additionally, we did not observe any difference in the mRNA levels 
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Figure 3 | The TLR-4 C terminus is distinctive in natural SIV hosts. 

a, Alignment of C-terminal TLR-4 protein sequences from different 
primate species (starting at human TLR-4 amino acid position 821). 

b, Primate phylogenetic tree with colour-coding according to the TLR-4 
C terminus as indicated in a. Phylogeny appears as in ref. 14. c, Cytokine 
release from blood of rhesus macaques (n = 9 biologically independent 
samples) and sooty mangabeys (n = 8 biologically independent samples) 
after LPS stimulation as measured by cytometric bead array. d, mRNA 
expression in whole blood after LPS stimulation quantified by quantitative 
PCR (qPCR). n= 4 biologically independent samples for each species. 

e, TNF and IL-6 cytokine release from blood of rhesus macaques and 
sooty mangabeys over the course of SIV infection. n =5 biologically 
independent samples for each species. Data are mean + s.d. (c-e), 
unpaired two-sided Student’s t-test, P values are indicated (c, d). f, Gene 


of TLR4 in cells from C. atys and M. mulatta, nor did the expres- 
sion of any factors in the TLR-4—-MyD88-TRIF signalling axis cor- 
relate with TNF and IL-6 production (Extended Data Fig. 8d and 
Extended Data Table 3). To more broadly characterize the effect of 
attenuated TLR-4 signalling in C. atys, we performed compara- 
tive RNA-seq profiling of LPS-treated monocytes, and found lower 
production of CaTNF and CalL6 mRNA (Extended Data Fig. 8e). 
Moreover, using gene set enrichment analysis (GSEA), we observed 
that induction of pro-inflammatory genes was broadly and significantly 
reduced in cells from C. atys (Fig. 3f, g and Extended Data Fig. 9). 
Overall, these results indicate that LPS stimulation of blood cells 
from C. atys results in a blunted production of pro-inflammatory 
cytokines. To establish a link between the C-terminal TLR4 sequence 
difference and the responsiveness to LPS, we analysed the TLR-4 
orthologues of humans, C. atys and M. mulatta in an NF-«B reporter 
assay. We observed a significantly attenuated NF-kB response to LPS of 
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set enrichment analysis of LPS-stimulated monocytes of rhesus macaques 
and sooty mangabeys using the TNF signalling via NF-«B hallmark gene 
set. g, GSEA of LPS-stimulated monocytes of rhesus macaques and sooty 
mangabey using the IL6 JAK-STAT3 hallmark gene set. h, NF-KB response 
to LPS of primate TLR4 variants in transfected HEK293T cells. NF-KB 
firefly-luciferase signals were normalized to Gaussia luciferase signals, and 
the relative increase in NF-«B activity compared to unstimulated controls 
(100%) was calculated. Data are mean £s.e.m. of n=5 independent 
experiments performed in triplicate transfections are shown. Unpaired 
two-sided Student's t-test, P values are indicated. For source data of the 
animal studies, see Supplementary Table 1. RM SM-CT, MmTLR-4 with 
the C terminus of CaTLR-4; SM RM-CT, CaTLR-4 with the C terminus of 
MmTLR-4. 


C. atys TLR-4 (CaTLR-4) compared to M. mulatta TLR-4 (MmTLR-4). 
Using chimaeric constructs encoding MmTLR¢4 with the C terminus of 
CaTLR4 or CaTLR4 with the C terminus of MmTLR4, we confirmed 
that the TLR4 C terminus is responsible for this phenotypic difference 
(Fig. 3h). This demonstrates a sequence-function relationship of the 
TLR4 C terminus and suggests a novel mechanism contributing to 
the lower immune activation of SIV-infected sooty mangabeys. 

Over the past decade the genomes of more than 25 nonhuman 
primate species have been sequenced, assembled and annotated!®. This 
knowledge has improved our understanding of primate evolution, 
biology and general physiology, which has informed human biology 
and medicine. Here, we report a high-coverage, high-contiguity 
whole-genome sequence for C. atys, a natural SIV host. Comparative 
genomic analyses of natural and non-natural SIV hosts provide 
candidate genes that potentially influence susceptibility to AIDS 
in SIV-infected hosts. We have previously used trancriptomics to 
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characterize the host response to SIV infection of C. atys and African green 
monkeys'®””. Here, we examined the mechanisms of AIDS resistance 
of a natural SIV host genome-wide using genome sequencing. We iden- 
tified candidate genes that show sequence changes that are specific to 
C. atys and two gene products (ICAM-2 and TLR-4), which show struc- 
tural differences between C. atys and M. mulatta that may influence 
cell-surface expression (ICAM-2) and downstream signalling (TLR-4) 
of these proteins. Our findings may also explain prior results showing 
that not all natural SIV hosts respond to infection in the same way, sug- 
gesting that in each primate species, multiple distinct mechanisms may 
contribute to the phenotype, rather than mutations in single genes, as 
has been purported, and eventually refuted, in other studies!. Further 
comparative studies with additional natural SIV host species may 
identify additional similarities (or differences) in the genes involved 
in the evolutionary pathways that led to AIDS resistance in different 
species of African nonhuman primates. 

In this study, we used whole-genome sequencing and comparative 
genomic analysis to identify candidate genes regulating host resistance 
to AIDS. Future studies in which these candidate genes are manipulated 
in vivo during SIV infection are needed to characterize to what extent 
these genes may influence the non-pathogenic nature of SIV infection 
in sooty mangabeys. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 


Sequencing and assembly of the sooty mangabey genome. DNA from a female 
sooty mangabey (C. atys) born and maintained at the Yerkes National Primate 
Research Center was extracted from whole blood. The animal selected for sequencing 
was one of the original dams of a large matrilineal line of the colony. In addition, 
she possessed the most common MHC haplotype observed within the group. As 
such, her genetic constitution within the closed population was thought to be the 
most representative of any single animal. All animals were housed at the Yerkes 
National Primate Research Center of Emory University and maintained in accord- 
ance with US NIH guidelines. All studies were approved by the Emory University 
Institutional Animal Care and Usage Committee. Following quality control to 
ensure purity and molecular weight, a series of lumina sequencing libraries were 
prepared using standard procedures. Paired-end libraries with nominal insert sizes 
180 bp and 500 bp were produced. In brief, 11g of DNA was sheared to the desired 
size using a Covaris S-2 system. Sheared fragments were purified with Agencourt 
AMPure XP beads, end-repaired, dA-tailed and ligated to Illumina universal adap- 
tors. After adaptor ligation, DNA fragments were further size selected by agarose 
gel and PCR amplified for six to eight cycles using Illumina P1 and Index primer 
pair and Phusion High-Fidelity PCR Master Mix (New England Biolabs). The 
final library was purified using Agencourt AMPure XP beads and quality assessed 
by Agilent Bioanalyzer 2100 (DNA 7500 kit) to determine library quantity and 
fragment size distribution before sequencing. 

Long mate-pair libraries with 2-kb, 3-kb, 5-kb and 8-kb insert sizes were con- 
structed according to the manufacturer's protocol (Mate Pair Library v.2 Sample 
Preparation Guide 15001464 Rev. A Pilot Release). In brief, 51g (for 2- and 3-kb 
size libraries) or 101g (5- and 8-kb libraries) of genomic DNA was sheared to 
the desired size by Hydroshear (Digilab), then end-repaired and biotinylated. 
Fragment sizes between 1.8-2.5 kb (2 kb), 3.0-3.7 kb (3 kb), 4.5-6.0 kb (5 kb) or 
8-10 kb (8kb) were purified from a 1% low-melting agarose gel and circularized by 
blunt-end ligation. These size-selected circular DNA fragments were then sheared 
to 400 bp (Covaris S-2), purified using Dynabeads M-280 Streptavidin Magnetic 
Beads, end-repaired, dA-tailed and ligated to lumina PE sequencing adapters. 
DNA fragments with adaptor molecules on both ends were amplified for 12 to 
15 cycles with Illumina P1 and Index primers. Amplified DNA fragments were 
purified with Agencourt AMPure XP beads. Quantification and size distribution 
of the final library was determined as described above before sequencing. 

Sequencing was performed on Illumina HiSeq 2000 instruments, generating 
100-bp paired-end reads. Raw sequences have been deposited in NCBI under 
Bioproject PRJNA157077. Reads were assembled using ALLPATHS-LG and fur- 
ther scaffolded and gap-filled using in-house tools Atlas-Link (v.1.0) and Atlas 
GapFill (v.2.2) (https://www.hgsc.bcm.edu/software/)”. Atlas-link is a scaffolding 
or super-scaffolding method that uses all unused mate pairs to increase scaffold 
sizes and create new scaffolds in draft-quality assemblies. Those modified scaffolds 
are then ordered and oriented. Atlas GapFill is run on a super-scaffolded assembly. 
Regions with gaps are identified and reads mapping within or across those gaps 
are locally assembled using different assemblers (Phrap, Newbler and Velvet) in 
order to bridge the gaps with the most conservative assembly of previously unin- 
corporated reads. 

PBJelly (v.14.9.9) is a pipeline that improves the contiguity of draft assemblies 
by filling gaps, increasing contig sizes and super scaffolding by making use of 
long reads”*. We used 12.3 x coverage of long Pacific Biosciences RSI and RS II 
sequences, along with the gap-filled Illumina read assembly, as input into PBJelly 
to produce the final C. atys hybrid Illumina—PacBio assembly. This assembly is 
available at NCBI as Caty1.0 (RefSeq accession GCF_000955945.1). 

The total size of the assembled C. atys genome is around 2.85 Gb, with a contig 
N50 size of 112.9 kb and scaffold N50 size of 12.85 Mb (Table 1). By comparison, 
this contig N50 size is greater than equivalent values for 22 of the 26 other non- 
human primate genome assemblies currently available. To assess completeness, 
we mapped 21,772 human protein-coding canonical transcripts to Caty_1.0 and 
found that 94.9% map to this C. atys genome with lengths of 95-100% (97.3% of 
transcripts map at length 70% or greater). As a more stringent test, we mapped 3023 
Benchmarking Universal Single-Copy Orthologues (BUSCO) genes and found 
that over 95% are present in Caty_1.0 (88.8% complete single copy and the others 
present but duplicated or fragmented)”. 

Genome annotation was performed through the NCBI Genome Annotation 
Pipeline, which generated models for genes, transcripts and proteins”°. To aid 
accurate transcript annotation, the NCBI pipeline incorporated RNA-seq data 
from a sooty mangabey pooled tissue reference sample, and data from 14 sepa- 
rate tissues produced through a joint effort by the Nonhuman Primate Reference 
Transcriptome Resource (NHPRTR; http://www.nhprtr.org/)”” and the Human 
Genome Sequencing Center (HGSC) of Baylor College of Medicine. The NCBI 
process also used human RefSeq and GenBank transcripts along with other 
primate protein data. 


Sequencing and polymorphism screen of 10 sooty mangabeys. DNA was pre- 
pared from blood or liver samples from 10 sooty mangabeys from the YNPRC 
colony. Ten sooty mangabey breeder animals were selected in consultation with 
the YNPRC Breeding Manager representing at least 90% of colony diversity based 
on the pedigree of the colony. Illumina paired-end libraries (300-bp insert size) 
were prepared as described above for 500-bp paired-end libraries. These libraries 
were sequenced (100 bp reads) on a HiSeq2000 instrument, producing an average 
of 30x whole-genome coverage across individuals. These reads were mapped to 
the C. atys assembly using BWA-mem and single-nucleotide variants were called 
using GATK (https://software.broadinstitute.org/gatk/). A gVCF file was created 
for each animal, and variation in the regions of interest for TLR4 and ICAM2 were 
identified in those files. 

Polymorphism screen among rhesus macaques. To assess variation in TRL4 and 
ICAM2 among rhesus macaques, we used our database of whole-genome sequence 
data from 133 individuals of this species. The details of sequencing and single- 
nucleotide variants discovery for this population have previously been described®. 
The population-level VCF file for this study was examined for relevant variation 
in these two genes. 

Targeted re-sequencing of ICAM2 and TLR4 in rhesus macaques and sooty 
mangabeys. To test the validity of the apparent species differences in ICAM2 and 
TLR4 between rhesus macaques and sooty mangabeys, primers were designed 
to flank three areas of interest (see Extended Data Figs 3a, 5b), PCR was per- 
formed using genomic DNA from two rhesus macaques and two sooty mangabeys 
(including FAK, the animal used for the Caty_1.0 reference genome) and the PCR 
product was subjected to Sanger sequencing. PCR primers were designed using 
Primer3 with default settings with the exception that the human mis-priming 
library was selected (http://bioinfo.ut.ee/primer3/)"*”*. Primers were tailed with 
M13 sequences to facilitate Sanger sequencing. 

PCR primer pairs (gene specific sequences are underlined): ICAM2_Ex2_F 
GTAAAACGACGGCCAGTATGTGCAGGTGGAGTGTGAT; ICAM2_Ex2_R 
GGAAACAGCTATGACCATGGCTCGAACAGACTCAGTGGA; ICAM2_Ex3_F 
GTAAAACGACGGCCAGTAAGCAGAGCAGGACAGATGT; ICAM2_Ex3_R 
GGAAACAGCTATGACCATGACTCTGCACAGTCAGACCTT; TLR4_SL_F 
GTAAAACGACGGCCAGTACCATGGAATGACTTGCCCT; TLR4_SL_R 
GGAAACAGCTATGACCATGCCTTTCAGCTCTGCCTTCAC, 

AmpliTaq Gold 360 DNA Polymerase (Applied Biosystems) was used to amplify 
PCR products using the following protocol: 95°C for 10 min; 95°C for 30s, 65°C 
for 30s, 72°C for 30s, 10 cycles (annealing temperature is decreased by 1°C per 
cycle); 94°C for 30s, 55°C for 30s, 72°C for 30s, 30 cycles; 72°C for 10min. PCR 
products were subjected to Sanger sequencing (in both directions) using M13 
primers. PCR and Sanger sequencing was performed at ACGT. Traces (see Fig. 2a 
for examples) were inspected and consensus sequences obtained for each PCR 
product. Primer sequences were trimmed and consensus sequences were deposited 
in GenBank (accession numbers: MF468275-MF468286). 

Sequencing and de novo assembly of RNA-seq transcripts. Transcripts for sooty 
mangabey were assembled de novo from RNA-seq reads using Trinity on XSEDE’s 
Blacklight supercomputer*’. The RNA-seq reads were pooled from 12 different 
tissues and were prepared by the standard mRNA-seq with the uracil DNA gly- 
cosylase protocol (Illumina kit Part RS-122-2303) and are publicly available from 
the Nonhuman Primate Reference Transcriptome Resource (NCBI SRA acces- 
sion numbers SRX270666 and SRX270667)’. We performed a number of filtering 
steps to prepare threads for de novo assembly, which included removing adapters, 
filtering for quality, removing poly A/T tails and removing mtDNA and common 
mammalian rRNA?” After filtering, we used an input of 1,635,074,685 RNA-seq 
reads as the basis for the transcriptome assembly. Using around 550 mostly con- 
tinuous compute hours on Blacklight, we partitioned the computational job into 
three phases described by the Trinity algorithm: Inchworm (around 100h x 64 
cores), Chrysalis (around 400h x 128 cores), and Quantify Graph and Butterfly 
(around 50h x 64 cores). To circumvent the large amount of I/O generated in the 
Quantify Graph phase, we ran Trinity directly from the RAM disk for this phase. 
Using Trinity (version r2012-10-05), the following options were selected: 
Trinity.pl-JM 512G-no_run_chrysalis-seqType fa-single, reads.fasta-run_as_ 
paired—CPU 16, Trinity.pl-JM 512G-no_run_quantifygraph-seqType fa-single, 
reads.fasta-run_as_paired—CPU 16-bflyGCThreads 4, Trinity.pl-JM 512G-no_ 
run_butterfly-seqType fa-single 

reads.fasta-run_as_paired—CPU 16., Trinity.pl-JM 512G-bflyGCThreads 16-bfly- 
CPU 32-seqType fa, -single reads.fasta-run_as_paired-CPU 16. 

The large N25 (6,431 bp), N50 (3,483 bp) and N75 (1,116bp) values of the resulting 
assembly were indicative of its success. 

Pipeline for finding divergent sooty mangabey proteins. C. atys assembly 
Caty_1.0 protein model predictions were screened against the curated M. mulatta 
MacaM protein models by alignment with BLASTp (v.2.2.28-+)”. The C. atys pro- 
tein model alignment with the lowest e value or highest bitscore (for equal e values) 
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was selected for each MacaM protein model, yielding the set of orthologous C. atys 
protein predictions most similar to the M. mulatta protein models. The spliced 
CDS sequence for each Caty_1.0 transcript prediction was extracted with gffread 
(utility from cufflinks v.2.1.1). Caty_1.0 transcript prediction CDS sequences were 
screened against the de novo RNA-seq assembly transcript models by alignment 
with BLAT (v.34) and an aligment score was calculated as the number of matching 
bases minus the number of CDS sequence bases missing in alignment gaps 
normalized by the CDS sequence length. 

This score penalizes bases missing from the CDS sequence without penalizing 

extra sequence that may have been added to the RNA-seq transcript model dur- 
ing the assembly process. Only predicted CDS sequences that had a score >0.99 
were retained as supported by RNA-seq data. The MacaM best match selected 
Caty1.0 protein models were then cross-referenced with the RNA-seq supported 
Caty_1.0 transcript models to eliminate protein models without RNA-seq evidence. 
The protein alignments to MacaM for these models were then re-examined to 
find genes for which the alignment identity was less than 97%, where there were 
gaps in the alignments or the alignment was not the full length of the protein 
model. These two species share a common ancestor about 10-11 million years 
ago, and therefore the expectation is that most proteins will be >97% identical. 
This was confirmed by using a maximum likelihood amino acid model (WAG 
amino acid matrix) to estimate sequence distances between the C. atys and M. 
mulatta orthologues (Extended Data Fig. 1). Proteins of interest for differential 
response to lentivirus infection may be more divergent than expected on average. 
These represent potentially divergent genes and were further screened against the 
Gene Ontology (GO) term ‘immune response. This list of divergent immune genes 
was then further curated by manual inspection of multiple alignments of cDNA 
transcript and genomic sequences of C. atys (Caty_1.0), M. mulatta (MacaM) and 
human (GRCh38.p7). Multiple alignment analysis was performed using Multalin 
(http://multalin.toulouse.inra.fr/). TLR4 and ICAM2 sequence alignments were 
generated using Jalview. 
Gene family evolution methods. In order to identify rapidly evolving gene families 
along the C. atys lineage, we obtained peptides from human, chimpanzee, orangu- 
tan, gibbon, macaque, baboon, vervet, marmoset and mouse from ENSEMBL 832, 
The C. atys peptides were obtained from NCBI*. To ensure that each gene was 
counted only once, we used only the longest isoform of each protein in each species. 
We then performed an all-versus-all BLAST search on these filtered sequences™. 
The resulting e values from the search were used as the main clustering crite- 
rion for the MCL program to group peptides into gene families**. This resulted 
in 14,889 clusters. We then removed all clusters only present in a single species, 
resulting in 10,967 gene families. We also obtained an ultrametric tree from a pre- 
vious study and added sooty mangabey based on its divergence time from baboon 
(TimeTree)***”. 

With the gene family data and ultrametric phylogeny as input, we estimated 
gene gain and loss rates (A) with CAFE v.3.0°*. This version of CAFE is able to 
estimate the amount of assembly and annotation error (<) present in the input data 
using a distribution across the observed gene family counts and a pseudo-likeli- 
hood search. CAFE is then able to correct for this error and obtain a more accurate 
estimate of A. We find an ¢ of about 0.04, which implies that 4% of gene families 
have observed counts that are not equal to their true counts. After correcting for 
this error rate, we find \=0.0020. These values for ¢ and \ are on par with those 
previously reported for mammalian datasets**? (Extended Data Table 3b). Using 
the estimated value, CAFE infers ancestral gene counts and calculates P values 
across the tree for each family and lineage to assess the significance of any gene 
family changes along a given branch. CAFE uses Monte Carlo re-sampling to assess 
ifa given family is rapidly evolving. For those families found to be rapidly evolving 
(P<0.01), it then calculates P values for each lineage within the family using the 
Viterbi method. Those lineages with low P values (P< 0.01) are said to be rapidly 
evolving. 

We observed 1,561 rapidly evolving families across the 10 species of mammals 

sampled here. Extended Data Table 3c summarizes the gene family changes for all 
10 species. Humans have the highest average expansion rate across all families at 
0.20 whereas gibbons have the lowest at —0.09, meaning that they have the most 
gene family contractions. C. atys has undergone 535 gene family expansions of 
which 96 are rapid expansions and 340 gene family contractions of which 48 are 
rapid contractions. 
Genetic distance between C. atys and M. mulatta orthologues. The amino acid 
sequences of 9,257 C. atys proteins with RNA-seq support (Fig. 1) were aligned 
to M. mulatta orthologues as described above. We then used the codeml pack- 
age from PAML (v.4.9a) on each of these alignments with the WAG amino acid 
rate matrix to calculate maximum likelihood genetic distances between the two 
sequences”, A histogram was generated from these distances with R (Extended 
Data Fig. 1a). 
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TLR4 gene tree. TLR4 nucleotide sequences for 17 primate species were obtained 
from the NCBI GenBank resource (human: NM_138554.4; rhesus macaque: 
XM_015116960.1; sooty mangabey: manually curated XM_012091593.1; bon- 
obo: NM_001279223.1; Nancy Ma’s night monkey: XM_012472756.2; drill: 
XM_011973281.1; colobus monkey: XM_011950060.1; crab-eating macaque: 
NM_001319615.1; squirrel monkey: XM_003925187.2; baboon: XM_003911309.4; 
pig-tailed macaque: NM_001305889.1; marmoset: XM_017975811.1; gorilla: 
XM_004048514.2; chimpanzee: NM_001144863.1; orangutan: AB445642.1; 
African green monkey: XM_007968248.1; gibbon: XM_003264057.3). These 
sequences were aligned with PASTA2 and we then constructed a maximum likeli- 
hood gene tree with RAxML3, performing 100 bootstrap replicates!” (Extended 
Data Fig. 7). Finding low bootstrap support amongst nodes ancestral to sooty 
mangabey, drill and baboon, we counted the number of sites that were discordant 
with respect to the gene tree topology. That is, the number of sites in which baboon 
and C. atys share the same state and C. atys and drill share a different state with an 
outgroup species (one of the two other Old World monkeys). 

Sample collection and processing. Peripheral blood samples from SIV-negative 
rhesus macaques and SIV-negative sooty mangabeys were collected by venipunc- 
ture according to standard procedures at the Yerkes National Primate Research 
Center of Emory University and in accordance with US National Institutes of 
Health guidelines. Human blood samples were obtained from healthy donors at the 
Yerkes National Primate Research Center in accordance with Institutional Review 
Board protocol IRB0004582 and all relevant ethical regulations. Informed consent 
was obtained from all blood donors. Peripheral blood mononuclear cells (PBMCs) 
were isolated from whole blood using Ficoll density-gradient centrifugation. 

In vitro TLR-ligand stimulation assay. The assay used in this study is a modified 
version of the procedure previously described’. Ultrapure LPS (Escherichia coli 
0111:B4) and monophosphory] lipid-A (Salmonella minnesota) were purchased 
from Invivogen. Whole blood collected in EDTA vacutainers was diluted 1:4 with 
RPMI 1640 medium and 195,11 aliquots were transferred to 96-well, round-bottom 
micro-titre plates. Agonists were diluted in RPMI 1640 and 5,11 were applied to 
the wells at the following final concentrations: LPS, 1,000-10 ng ml}; lipid-A, 
10-1 .g ml“. Suspensions were then mixed by pipet and incubated at 37° C, 5% 
CO; for 4h). After incubation, plates were centrifuged at 700 r.p.m. for 10 min, 
and 120 11 of cell-free supernatant was removed and stored at —80°C until the 
assay was carried out. Each TLR ligand at a given concentration was performed 
in triplicate for each animal. 

Cytokine bead array (CBA). Samples were obtained from sooty mangabeys and 
rhesus macaques housed at the YNPRC. Sooty mangabeys were naturally infected 
at the YNPRC and rhesus macaques had been infected previously with SIVgmnm as 
previously described”. Supernatant levels of TNF and IL-6 were measured using 
the human inflammation CBA kit (BD Biosciences Immunocytometry Systems) 
according to the manufacturer's instructions, with the modification that the sample 
volumes for supernatant, antibody-coupled bead mix and PE-conjugated detection 
antibody solution were all reduced to 25 ul instead of 50 l**. After incubation, 
samples were washed with 2% paraformaldehyde in PBS, resuspended in 15011 
PBS, and analysed using a FACSCalibur flow cytometer (BD Biosciences 
Immunocytometry Systems). The average of triplicate cytokine measurements was 
used as the representative value for individual animals, and variations in cytokine 
levels between species groups were tested for statistical significance using unpaired 
t-tests in Prism 6.0. To quantify the level of TLR4 mRNA, and to perform linear 
regression of TLR-signalling molecules with TNF and IL6 cytokine levels, in the 
LPS-stimulated blood samples in the longitudinal SIV.mm-infected samples, we 
used microarray expression data from matched whole-blood samples; these data 
are available from the NCBI Geo database (accession GSE16147). 

Plasma viral load measurement. Quantification of SIVgmm plasma viral RNA 
levels were quantified using qPCR as described previously. 

RNA-seq analysis of LPS-stimulated monocytes. RNA-seq analysis was con- 
ducted at the Yerkes Nonhuman Primate Genomics Core Laboratory (http://www. 
yerkes.emory.edu/nhp_genomics_core/). CD14* monocytes were isolated from 
Ficoll-isolated PBMCs using CD14 MicroBeads according to the manufacturer's 
instructions (Miltenyi Biotec). Subsequently, 0.4 x 10° cells were stimulated for 6h 
with 10 ng ml! LPS and then immediately lysed in 350,11 RLT buffer (Qiagen). 
RNA was purified using Micro RNEasy columns (Qiagen) and RNA quality was 
assessed using Agilent Bioanalyzer. Then, 10 ng of total RNA was used as input for 
mRNA amplification using 5’ template-switch PCR with the Clontech SMART- 
Seq v.4 Ultra Low Input RNA kit, according to the manufacturer's instructions. 
Amplified mRNA was fragmented and appended with dual indexed barcodes 
using Illumina NexteraXT DNA Library Prep kits. Libraries were validated by 
capillary electrophoresis on an Agilent 4200 TapeStation, pooled and sequenced on 
an Illumina HiSeq 3000 using (100 bp paired-end reads) at an average read depth 
of 18 million. RNA-seq data were analysed by alignment and annotation to either 
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the MacaM v.7.8.2 assembly of the Indian rhesus macaque genome (available at 
https://www.unmc.edu/rhesusgenechip/index.htm) or to the Caty_1.0 assembly”. 
Alignment was performed using STAR v.2.5.2b using the annotation as a splice 
junction and abundance estimation reference, and non-unique mappings were 
removed from downstream analysis‘”. Transcripts were annotated using both the 
MacaM and Caty 1.0 assemblies and annotation as described in the text. Transcript 
abundance was estimated internally in STAR using the algorithm of HT-Seq and 
differential expression analyses were performed using the DESeq2 packages**”. To 
quantitatively compare the degree to which LPS treatment induced inflammatory 
gene expression between species, we used GSEA™. GSEA was performed using the 
desktop module available from the Broad Institute (https://www.broadinstitute. 
org/gsea/)°!. Gene ranks for contrasts of LPS-treated versus untreated samples 
were calculated from the normalized expression tables using the signal-to-noise 
metric for each species separately. Ranked datasets contrasting LPS-treated ver- 
sus untreated samples were tested for enrichment of the gene sets ‘HALLMARK_ 
TNFA_SIGNALING_VIA_NFKB’ (M5890) and ‘HALLMARK_IL6_JAK_ 
STAT3_SIGNALING’ (M5897) from the Molecular Signatures Database (http:// 
www.broadinstitute.org/gsea/msigdb/index.jsp) using gene set permutation to 
test for statistical significance. Heat maps and other visualizations were generated 
using Partek Genomics software, v.6.6. 

ICAM2 exon splice junction analysis. RNA-seq alignments from all 24 LPS- 
stimulated monocyte samples, and alignments derived from deep RNA-seq (over 
50 million reads) from two samples derived from flow-sorted, purified, blood 
C. atys conventional dendritic cells (cDCs, defined as CD3~ CD14~ CD20°- CD 123 
HLA-DR*CD11Ic*) that were prepared alongside the monocytes were examined 
for observed splicing. To provide additional depth, we also included RNA-seq data 
from two flow-purified M. mulatta ‘non-classical monocyte samples (defined as 
CD14 CD16*HLA-DR*NKG2~CD3~ CD20_) and one C. atys sample from CD4* 
T transitional memory cells (CD4* TTM, defined as CD3*CD4*CD8”- CD45RA~ 
CD95+CD28*CCR7"8*CD62L~-CD14- CD16 CD20° ). Reads from the alignment 
(BAM) files that mapped from 5 kb upstream to 5 kb downstream of the ICAM2 
loci were scanned by a custom Perl script that recorded evidence of splicing from 
the CIGAR field, and accumulated counts of reads supporting either splicing or 
read-through at each site. Splice site counts for all the samples were added together 
and compared to find the proportion of reads supporting each splice variant or 
intronic retention. 

NF-«B luciferase reporter assay. Protein expressing constructs encoding human 
TLR4, MmTLR4, CaTLR4, MmTLR4 with the C terminus of CaTLR4, and CaTLR4 
with the C terminus of MmTLR4 were generated by the Emory Custom Cloning 
Core Division using standard cloning techniques. HEK293T were obtained from 
ATCC and regularly checked for mycoplasma contamination. 

To determine the responsiveness of MmTLR-4 and CaTLR-4 to LPS, 
HEK293T cells were seeded in poly-t-lysine-coated 96-well plates and trans- 
fected in triplicate using a standard calcium phosphate transfection protocol. 
Cells were co-transfected with expression plasmids of human MD-2 (pEFBOS, 
5ng), human CD14 (pcDNA3, 5ng) and different TLR-4 orthologues or 
chimaeras (pEF la, 2.5ng). The MD-2- and CD14-expression plasmids were 
provided by A. Medvedev; the NF-«B reporter construct was made available 
by B. Baumann, A firefly-luciferase reporter under the control of three 
NF-«B-binding sites (100 ng) and a Gaussia luciferase reporter (5 ng) under the 
control of the pTAL promoter were co-transfected to monitor NF-«B activity. 
The pTAL promoter construct contains a minimal TATA-like promoter (pTAL) 
region from the herpes simplex virus thymidine kinase (HSV-TK) promoter 
(Clontech) that is nonresponsive to NF-«B and served as an internal control. To 
activate NF-KB, cells were stimulated with 5 1g ml~! LPS (E. coli 026:B6, eBio- 
science) for 5h. After 40h of transfection, a dual luciferase assay was performed 
and the firefly luciferase signals were normalized to the corresponding Gaussia 
luciferase control values. 
qPCR. TLR stimulations of whole blood for qPCR were performed using the same 
method as for cytokine protein assay but scaled proportionally to use 1 ml of 
blood as input. Following stimulation, leukocytes were recovered by centrifuga- 
tion at 7001.p.m. for 5 min and removal of erythrocytes by incubation in ACK 
lysis buffer. Cells were lysed in 350 11 of RLT buffer, and RNA purified using 
the RNeasy Mini kit (Qiagen) according to the manufacturer’s instructions. 
qPCR was performed on RNA as previously described™*. Primers to cytokines 
for qPCR were designed using Primer Express software (Applied Biosystems) 
to regions of 100% nucleotide identity between M. mulatta and C. atys: 12S 
rRNA (endogenous standard) forward 5’-CCCCCTAGAGGAGCCTGTTC-3’, 
12S rRNA reverse 5’-GGCGGTATATAGGCTGAGCAA-3’; TNF forward 
5'-GCCCTGGTATGAGCCCATCTA-3’, TNF reverse 5/-CGAGATAGTCGGGCA 
GATTGA-3’; IL6 forward 5‘’-GAGAAAGGAGACATGTAACAGGAGTAAC-3/, 
IL6 reverse 5'-TGGAAGGTTCAGGTTGTTTTCTG-3’. Fold change was 
calculated by dividing the normalized post-treatment sample quantity with the 


normalized untreated control quantity from the same animal, and calculating the 
average of fold changes for each species. 

Flow cytometry of PBMCs. Multicolour flow cytometry staining was performed 
using the following antibodies and reagents: CD3-APC/Cy7 (SP34-2), CD14—PE/ 
Cy7 (M5E2) and CD20-PE/Cy5 (2H7) from BD; CD4-BV650 (OKT4), 
CD8-BV711 (RPA-T8), ICAM-2-FITC (CBR-IC2/2), Mouse IgG2a(«.)-FITC 
(MOPC-173) isotype control from Biolegend; Live/Dead Fixable Aqua from 
Thermo Fisher Scientific. Cells were stained for flow cytometry and data were 
acquired on an LSR I] cytometer (BD) and analysed by FlowJo 10 software 
(TreeStar). Further analyses were performed using PRISM (GraphPad) and Excel 
(Microsoft Office 2011) software. 

ICAM-2 western blot. PBMCs were lysed in RIPA buffer and equal amounts of 
cell lysate were boiled after addition of sample buffer including 3-mercaptoethanol, 
resolved with a 4-15% SDS-PAGE (Bio-Rad), and proteins were transferred to an 
Immobilon-P PVDF membrane (Millipore). Afterwards membranes were blocked 
for 1 h in blocking buffer (Bio-Rad) and incubated overnight with polyclonal rabbit 
ICAM-2-specific antibody (Bethyl). After washing (PBS with 0.05% Tween-20), 
anti-rabbit HRP-conjugated secondary antibody was incubated for an additional 
1h, washed, and HRP activity was determined using the Super Signal West Pico Kit 
(Bio-Rad and visualized using the ChemiDoc XRS-+ (Bio-Rad). Then the mem- 
brane was stripped with buffer (2% SDS, 0.5 M Tris, pH 2.2), blocked again and 
6-actin was detected using a rabbit anti-B-actin antibody as primary antibody and 
anti-rabbit-HRP antibody as secondary antibody. 

Statistical analysis. Statistical significance was determined using an unpaired 
Student's t-test with Welch's correction. P < 0.05 was considered significant. 
*P < 0.05; **P <0.01; NS, not significant. Data are mean + s.d. or s.e.m. as indi- 
cated. Significance for comparisons of mRNA levels of individual genes in RNA-seq 
data was tested using the Wald test as part of the DESeq2 workflow. Bars represent 
group means, and dots represent read counts for individual samples normalized to 
library size. P values denoted are adjusted using Benjamini-Hochberg correction. 
Code availability. We used a custom script to quantify ICAM-2 splice junctions. 
This script is available at Github: https://github.com/BosingerLab/splicing-analysis. 
Data availability. Raw sequences of the C. atys reference genome have been depos- 
ited in NCBI under Bioproject accession number PRJNA157077. The genome 
assembly is available at NCBI as Caty1.0 (RefSeq accession GCF_000955945.1). 
The multi-tissue C. atys RNA-seq reads are available from the Nonhuman Primate 
Reference Transcriptome Resource (NCBI SRA accession numbers SRX270666 
and SRX270667). Data from Sanger sequencing of TLR4 and ICAM2 are availa- 
ble at NCBI (accession numbers MF468275-MF468286). Microarray data used 
for TLR-4 measurement and linear regression with TNF and IL-6 are available 
from the NCBI GEO database (accession GSE16147). The RNA-seq data for LPS- 
stimulated monocytes was submitted to the GEO database (accession numbers 
GSM2711028-GSM2711051 and GSE101617). 
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Extended Data Figure 1 | Genetic distances of C. atys and M. mulatta and the solid red line represents the 97th percentile. This percentile 
orthologues and protein sequence alignments of CD4 and CCRS5. indicates that 8,979 out of 9,257 genes have a distance less than 0.0294. 
a, Genetic distances of C. atys and M. mulatta orthologues. The dotted b, Pairwise alignment of CD4 and CCRS protein sequences for C. atys and 
blue line represents a mean distance of 0.00755 expected substitutions, M. mulatta. Sequences were aligned using Jalview v.2.9.0. 
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Extended Data Figure 2 | Sequence alignment of ICAM-2 protein and was confirmed in 10 additional individuals. Sequencing reads were aligned 
exon sequence analysis of ICAM2. a, Pairwise alignment of predicted to the C. atys reference genome and visualized using Integrative Genomics 
ICAM-2 protein models for sooty mangabey and rhesus macaque. Viewer (IGV). The red arrow indicates the position of the 499-bp genomic 
Exon structure is highlighted based on human ICAM-2. Alignment was deletion in C. atys. 

performed using Jalview v.2.9.0. b, The sequence of exon 3 of CalCAM2 
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Extended Data Figure 3 | Predicted model of the ICAM2 gene structure 
and ICAM2 genome sequence alignments. a, Predicted model of ICAM2 
gene structure of M. mulatta and C. atys and the location of PCR primers 
for Sanger sequencing. Light blue, untranslated region; dark blue, CDS; 
red lines, intronic sequence; dotted line, exonic and intronic sequences 
present in human ICAM2 and MmICAM2 but not in CaICAM2; red box, 
the sequence that would be intronic in MmICAM2, but which is included 
in the exonic sequence of Ca[CAM2; light-purple box for Ca[CAM2 exon 
4 represents the fact that the exon 4 sequence in MmICAM2 is present in 
CaICAM2 but is not included in the Ca[CAM2 CDS due to a stop codon in 


the CaI[CAM2 exon 3. Primer positions are indicated by arrows. Predicted 
PCR products are indicated by thick lines. Primers Ex3_F and Ex3_R were 
designed to amplify a region spanning a putative genomic deletion which 
includes the 3’ region of Ca[CAM2 exon 3 and intron 3. b, Alignment 

of ICAM2 genomic sequences. Sanger sequencing of 2 rhesus macaques 
and 2 sooty mangabeys (including the Caty_1.0 reference animal) was 
performed to confirm the ICAM2 genomic deletion specific to C. atys. 
Starting at MmICAM2 nucleotide position 3166, sequences were aligned 
using Jalview v.2.9.0. Dashed lines denote the deletion in C. atys. RM, 
rhesus macaque; SM, sooty mangabey. 
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Extended Data Figure 4 | ICAM2 splice junction analysis in C. atys retention. b, MmICAM2 splicing analysed by RNA-seq read alignment 
and M. mulatta by RNA-seq read alignment. a, Quantification of to the reference genome and visualized in IGV. c, CaICAM2 splicing 
observed splicing. Splice site counts for RNA-seq read alignments were analysed by RNA-seq read alignment to the reference genome and 
added together and sites with more than 100 total reads were compared visualized in IGV. 


to find the proportion of reads supporting each splice variant or intronic 
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Extended Data Figure 5 | Sequence alignment of TLR-4 and the 
structure of the TLR4 gene. a, Pairwise alignment of TLR-4 protein 
sequences for C. atys and M. mulatta. The sequence difference at the 

C terminus is highlighted in red. Sequences were aligned using Jalview 
v.2.9.0. b, TLR4 gene structure and location of PCR primers. Light blue, 
untranslated region; dark blue, CDS; red lines, intronic sequence. Primer 


positions are indicated by arrows. Predicted PCR product is indicated by 
thick line. Primers TLR4_F and TLR4_R were designed to amplify a region 
including a putative stop-loss mutation present in CaTLR4 but not in 
MmTLR4. c, Chromatograms showing stop-loss (indicated by arrows) in 
the TLR4 gene in C. atys with respect to M. mulatta. The relevant codon is 
underlined. 
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additional individuals. Sequencing reads were aligned to the Caty_1.0 C terminus from different primate species. Starting at human TLR4 
reference genome and visualized in IGV. The red arrow indicates the nucleotide position 2461, sequences were aligned using Jalview v.2.9.0. 
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ace Squirrel monkey 
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Nancy Mas night monkey 
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Marmoset 
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a African green monkey 
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Outgroup | A}C |G Rhesus macaque 


98 


0.009 Crab eating macaque 


Extended Data Figure 7 | Maximum likelihood gene tree of TLR4. This 
topology corresponds to the accepted species relationships for Old World 
monkeys. However, low bootstrap support among the nodes ancestral to 
C. atys, drill and baboon indicate that several sites within the gene do not 
support that ordering and may be indicative of incomplete lineage sorting. 
The table on the left shows these sites. 
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Extended Data Figure 8 | Analysis of cytokine expression and release 
after activation of TLR-4. a, TNF release from whole blood upon 
stimulation with lipid-A. Whole blood was stimulated with lipid-A at the 
indicated concentrations for 4h and cytokine secretion was measured 
by cytometric bead array. n = 5 biologically independent samples for 

M. mulatta; n = 4 biologically independent samples for C. atys. b, IL-6 
release from whole blood upon stimulation with lipid-A. Whole blood 
was stimulated with lipid-A at the indicated concentrations for 4h 

and cytokine secretion was measured by cytometric bead array. n =8 
biologically independent samples for M. mulatta; n= 9 biologically 
independent samples for C. atys. c, SIVsmm plasma viral load for 

M. mulatta and C. atys. SIVsmm RNA levels in plasma were quantified at 
the indicated time points after intravenous inoculation with a primary 
uncloned SIVgmm C. atys isolate. n = 5 biologically independent samples 
for each species. d, TLR4 mRNA levels in LPS-stimulated blood samples. 
To test the level of TLR4 expression in the LPS-stimulated blood samples 
shown in Fig. 3e, we isolated RNA from whole blood from time-point 
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matched replicate samples using PAXgene Blood RNA tubes, and analysed 
expression using Affymetrix GeneChip Rhesus Macaque Genome Arrays, 
which contains three independent probesets specific for MmTLR4 
(denoted on the x axis). Probeset intensities are displayed along the y axis 
as RMA normalized values. n = 3 biologically independent samples for 

M. mulatta; n = 4 biologically independent samples for C. atys. 

a-d, Dots represent individual animals, and the bar represents the mean. 
Unpaired two-sided Student’s t-test, P values are indicated. e, TNF and IL6 
mRNA levels in LPS-stimulated monocytes from M. mulatta and C. atys. 
RNA-seq was used to assay global changes in gene expression after LPS 
stimulation of primary CD14* monocytes. Significance for comparisons 
of mRNA levels of individual genes was tested using the Wald test as part 
of the DESeq2 workflow. Bars represent group means, and dots represent 
read counts for individual samples normalized to library size. Indicated 

P values are adjusted using the Benjamini-Hochberg correction. n =6 
biologically independent samples for each species. 
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Extended Data Figure 9 | LPS-mediated induction of TNF and IL-6 
inflammatory signalling is globally attenuated in C. atys. a, b, Data 
shown are the leading-edge genes depicted in Fig. 3f, g (GSEA plots), for 
TNF-signalling genes (a) and IL-6-signalling genes (b). Values are the 
log»-transformed difference between LPS-treated and untreated samples 
for each individual animal. Genes selected are the combination of leading- 
edge/core-enriched genes for M. mulatta and C. atys GSEA analyses for 


each pathway. The gene sets selected for enrichment testing were obtained 
from the MSIGDB database hallmark collection are denoted at the top 

of each panel. Genes were organized using hierarchical clustering with 
Spearman dissimilarity and average linkage to estimate distance between 
genes and clusters, respectively. The colour scale at the bottom denotes the 
maximum and minimum on a log) scale. For animal study source data, see 
Supplementary Table 2. 


© 2018 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. 


LETTER 


Extended Data Table 1 | Amino acid divergence in proteins from C. atys identified by the immunogenomic comparison pipeline 


Gene 


CD24 
APOBEC3C 
DEFB129 
CLEC2D 
GZMA 
PGLYRP1 
CCL24 
C5AR1 
BST2 
PF4 
S100A7 
CLEC6A 
MB21D1 
BPI 
PRG3 
GSDMD 
CLEC4A 
CLEC4D 
PPBP 
cD4 
CTSG 
CD33 
LY96/MD2 
CCL11 


aa, amino acids. 


Function 


B cell and granulocyte activation/differentiation 


retroviral restriction factor 
antimicrobial 


inhibits NK-cell-mediated lysis 


cell lysis mediated by CD8+ T cells and NK cells 


Peptidoglycan recognition on gram-positive bacteria 


chemoattractant for resting T cells 
complement receptor 


retroviral restriction factor 


coagulation, chemoattractant for neutrophils and 


monocytes 
antimicrobial, immunomodulatory 


mannose-dependent pathogen recognition, 
proinflammatory 


antiviral, cytosolic DNA sensor 
antimicrobial, LPS-sensing 

cytotoxic and cytostimulatory activities 
antimicrobial, pyroptosis 

Pattern recognition receptor 

inflammation and immune responses 
chemoattractant and activator of neutrophils 
T cell receptor activation, HIV/SIV receptor 
lysosomal antigen processing 

adhesion molecule on myeloid cells 
associates with TLR4 for LPS binding 


chemoattractant for eosinophils 


RM length SM length Identity 
it) 


aa aa %o 
78 77 88.5 
190 190 91.6 
183 183 94.5 
198 199 94.5 
262 262 94.7 
196 196 94.9 
119 119 95.0 
350 350 95.1 
182 182 95.6 
196 196 96.0 
101 101 96.0 
209 209 96.2 
522 522 96.2 
487 487 96.3 
225 225 96.4 
484 484 96.5 
204 204 96.6 
215 215 96.7 
128 128 96.8 
458 458 96.9 
255 255 96.9 
359 359 96.9 
160 160 96.9 
97 97 96.9 
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Extended Data Table 2 | Analysis of immune gene families across species 


Panel A 
Change Type gene family function SM AGM RM _ Human Chimp’ Baboon 
‘ ADAM ' ‘ 
Expansion (+5) metalloproteinases cytokine regulation 30 20 27 22 18 24 
Expansion (+6) = scavenger receptors LDL binding 17 9 11 9 10 10 
Expansion (+6) — butyrophilin lymphocyte deactivation 16 10 9 9 t 10 
Expansion (+3) © TNFRSF10/TRAIL apoptosis induction 6 5 3 
Expansion (+2) CD300 lipid-binding, immunomodulation 5 3 2 3 3 3 
Contraction (-3) | C-C-motif chemokines | chemoattractant for immune cells 6 9 10 20 9 10 


Panel B 
A(No Error Model) e(Estimatederror) A (Error Model = 
é) 
10 species in this study 0.00268 0.04268 0.00204 
11 species Gibbon 0.00258 0.04101 0.00141 
Genome Project 7 
10 mammal dataset “° 0.00238 0.07324 0.00186 
Panel C 
Expansions Contractions No Avg. 
Change Expansion 
Families Genes genes/ Families Genes _ genes/ 
gained expansion lost contraction 
Sooty 535 (96) 1153 2.16 340 (48) 494 1.45 10106 0.024528 
Human 1042 (276) 3471 3.33 192 (10) 210 1.09 9747 0.200967 
Marmoset 1027 (122) 2213 2.15 668 (23) 841 1.26 9286 0.107504 
Chimp 161 (23) 384 2.39 874 (69) 1137 1.3 9946 -0.081244 
Gibbon 354 (13) 552 1.56 1089 (92) 1466 1.35 9538 -0.085529 
Baboon 290 (61) 660 2.28 624 (41) 737 1.18 10067 -0.028084 
Orang 548 (65) 1032 1.88 749 (14) 820 1.09 9684 -0.003921 
Macaque 1101 (203) 2904 2.64 783 (22) 835 1.07 9097 0.100666 
Mouse 631 (38) 2719 4.31 855 (9) 1027 1.2 9495 0.013404 
Vervet 294 (19 658 2.24 674 (59 921 1.37 10013 -0.039209 


a, Expansion and contraction of immune gene families across six primate species. b, Assembly and annotation error estimations and gene gain and loss rates in a single \ model in 13 mammals. 
c, Summary of gene gain and loss events inferred after correcting for annotation and assembly errors across all 13 species. The number of rapidly evolving families is shown in parentheses for each 
type of change. AGM, African green monkey. 
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Extended Data Table 3 | Correlation analysis between TLR-signalling molecules and gene expression 


Gene Gene Symbol Affymetrix Probeset ID r—SM p value — Lower Upper r-RM p-value - Lower Upper 
Name -RM TNF SM TNF CISM CISM TNF RM TNF CIRM CIRM 
AP1 JUN MmugDNA.22829.1.S1_at -0.52 0.07 -0.83 0.04 0.14 0.7 -0.54 0.71 
CD14 CD14 MmuSTS.1982.1.S1_at -0.28 0.35 -0.72 0.32 0.3 0.41 -0.41 0.78 
IKKA CHUK MmuSTS.1867.1.S1_at 0.25 0.4 -0.35 0.71 -0.31 0.38 -0.79 0.40 
IKKB IKBKB MmugDNA.8188.1.S1_at -0.51 0.07 -0.83 0.05 0.05 0.88 -0.60 0.66 
IKKG IKBKG MmuSTS.4600.1.S1_at -0.26 0.4 -0.71 0.34 0.14 0.7 -0.54 0.71 
IRAK1 IRAK1 MmugDNA.38816.1.S1_at -0.33 0.28 -0.74 0.27 0.3 0.39 -0.40 0.78 
IRF7 IRF7 MmugDNA.29625.1.S1_at -0.6 0.03 -0.86 -0.07 -0.16 0.66 -0.72 0.52 
JNK MAPK8 MmugDNA.7819.1.S1_at 0.48 0.1 -0.10 0.81 -0.27 0.45 -0.77 0.43 
MYD88& MYD88& MmugDNA.10008.1.S1_at -0.41 0.17 -0.78 0.18 -0.08 0.83 -0.67 0.58 
NFKB1 NFKB1 MmuSTS.3011.1.S1_at -0.29 0.34 -0.73 0.31 0.2 0.58 -0.49 0.74 
NFKB2 NFKB2 MmugDNA.25060.1.S1_at -0.5 0.08 -0.82 0.07 0.07 0.86 -0.59 0.67 
P38 MAPK1 MmugDNA.1694.1.S1_at 0.43 0.15 -0.16 0.79 0.09 0.8 -0.57 0.68 
RIP1 RIPK1 MmugDNA.40799.1.S1_at -0.36 0.23 -0.76 0.24 0.21 0.57 -0.49 0.74 
TAB1 TAB1 MmuSTS.1553.1.S1_at -0.15 0.63 -0.65 0.44 0.42 0.23 -0.29 0.83 
TAK1 MAP3K7 MmugDNA.34734.1.S1_s_at 0.87 0 0.62 0.96 -0.3 0.4 -0.78 0.41 
TBK1 TBK1 MmuSTS.3947.1.S1_at -0.09 0.78 -0.61 0.49 -0.5 0.14 -0.86 0.19 
TIRAP TIRAP MmugDNA.502.1.S1_at -0.48 0.1 -0.81 0.10 0.1 0.79 -0.57 0.69 
TLR4 TLR4 MmuSTS.4032.1.S1_at -0.19 0.52 -0.67 0.40 -0.69 0.03 -0.92 -0.10 
TRAF6 TRAF6 MmuSTS.4612.1.S1_at 0.48 0.09 -0.09 0.82 0.04 0.91 -0.60 0.65 
TRAM TICAM2 MmuSTS.930.1.S1_at -0.24 0.43 -0.70 0.36 -0.6 0.07 -0.89 0.05 


TRIF TICAM1 MmugDNA.27425.1.S1_at -0.35 0.23 -0.76 0.24 0.58 0.08 -0.08 0.88 


Gene Gene Symbol Affymetrix Probeset ID r-SM p-value - Lower Upper r-RM p-value Lower Upper 
Name -RM IL6 SM IL6 CISM cISM IL6 RM IL6 cISM cISM 
AP1 JUN MmugDNA.22829.1.S1_at -0.45 0.12 -0.80 0.13 0.23 0.52 -0.47 0.75 
CD14 CD14 MmuSTS.1982.1.S1_at -0.05 0.87 -0,.58 0.52 0.01 0.98 -0.62 0.64 
IKKA CHUK MmuSTS.1867.1.S1_at -0.02 0.95 -0.56 0.54 0.05 0.89 -0.60 0.66 
IKKB IKBKB MmugDNA.8188.1.S1_at 0.07 0.82 -0.50 0.60 0.12 0.73 -0.55 0.70 
IKKG IKBKG MmuSTS.4600.1.S1_at -0.25 0.4 -0.71 0.35 -0.04 0.91 -0.65 0.60 
IRAK1 IRAK1 MmugDNA.38816.1.S1_at -0.24 0.43 -0.70 0.36 -0.22 0.54 -0.75 0.47 
IRF7 IRF7 MmugDNA.29625.1.S1_at 0.05 0.88 -0.52 0.58 -0.04 0.91 -0.66 0.60 
JNK MAPK8 MmugDNA.7819.1.S1_at -0.01 0.97 -0.56 0.54 0.32 0.37 -0.39 0.79 
MYD88& MYD88& MmugDNA.10008.1.S1_at -0.09 0.77 -0.61 0.48 0.34 0.33 -0.36 0.80 
NFKB1 NFKB1 MmuSTS.3011.1.S1_at -0.05 0.87 -0.59 0.51 0.37 0.3 -0.34 0.81 
NFKB2 NFKB2 MmugDNA.25060.1.S1_at -0.01 0.97 -0.56 0.54 -0.32 0.36 -0.79 0.38 
P38 MAPK1 MmugDNA.1694.1.S1_at -0.25 0.41 -0.70 0.35 -0.15 0.69 -0.71 0.53 
RIP1 RIPK1 MmugDNA.40799.1.S1_at -0.61 0.03 -0.87 -0.10 0.34 0.34 -0.37 0.80 
TAB1 TAB1 MmuSTS.1553.1.S1_at -0.31 0.3 -0.74 0.29 -0.26 0.47 -0.76 0.44 
TAK1 MAP3K7 MmugDNA.34734.1.S1_s_at 0.03 0.91 -0.53 0.57 -0.31 0.38 -0.79 0.39 
TBK1 TBK1 MmuSTS.3947.1.S1_at -0.11 0.72 -0.62 0.47 0.35 0.32 -0.36 0.80 
TIRAP TIRAP MmugDNA.502.1.S1_at -0.06 0.83 -0.59 0.50 0.35 0.33 -0.36 0.80 
TLR4 TLR4 MmuSTS.4032.1.S1_at -0.13 0.68 -0.63 0.46 -0.15 0.68 -0.71 0.53 
TRAF6 TRAF6 MmuSTS.4612.1.S1_at 0.58 0.04 0.05 0.86 0.25 0.49 -0.45 0.76 
TRAM TICAM2 MmuSTS.930.1.S1_at 0.14 0.65 -0.45 0.64 -0.17 0.63 -0.72 0.51 


TRIF TICAM1 MmugDNA.27425.1.S1_at -0.25 0.41 -0.70 0.35 0.05 0.89 -0.60 0.66 


Pearson’s correlation coefficients (r) were calculated separately for cytokines from C. atys and M. mulatta (TNF or IL-6) protein measurements versus MRNA levels of TLR-4-signalling genes measured 
in matched blood samples using Affymetrix GeneChips. P values denote the significance of the Pearson’s correlation coefficient. Cl, confidence interval. 
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MicroRNAs from the parasitic plant Cuscuta 
campestris target host messenger RNAs 


Saima Shahid!’, Gunjune Kim?, Nathan R. Johnson!, Eric Wafula*, Feng Wang!?+, Ceyda Coruh!+, Vivian Bernal-Galeano?, 
Tamia Phifer*, Claude W. dePamphilis'?, James H. Westwood? & Michael J. Axtell! 


Dodders (Cuscuta spp.) are obligate parasitic plants that obtain 
water and nutrients from the stems of host plants via specialized 
feeding structures called haustoria. Dodder haustoria facilitate 
bidirectional movement of viruses, proteins and mRNAs between 
host and parasite’, but the functional effects of these movements 
are not known. Here we show that Cuscuta campestris haustoria 
accumulate high levels of many novel microRNAs (miRNAs) while 
parasitizing Arabidopsis thaliana. Many of these miRNAs are 22 
nucleotides in length. Plant miRNAs of this length are uncommon, 
and are associated with amplification of target silencing through 
secondary short interfering RNA (siRNA) production’. Several 
A. thaliana mRNAs are targeted by 22-nucleotide C. campestris 
miRNAs during parasitism, resulting in mRNA cleavage, secondary 
siRNA production, and decreased mRNA accumulation. Hosts with 
mutations in two of the loci that encode target mRNAs supported 
significantly higher growth of C. campestris. The same miRNAs that 
are expressed and active when C. campestris parasitizes A. thaliana 
are also expressed and active when it infects Nicotiana benthamiana. 
Homologues of target mRNAs from many other plant species also 
contain the predicted target sites for the induced C. campestris 
miRNAs. These data show that C. campestris miRNAs act as trans- 
species regulators of host-gene expression, and suggest that they may 
act as virulence factors during parasitism. 

In host-induced gene silencing (HIGS), siRNA-producing transgenes 
silence targeted pathogen and parasite mRNAs in trans*“. Plant-based 
HIGS is effective against fungi> , nematodes®, insects’ and the parasitic 
plant Cuscuta pentagona®. The ease with which HIGS can be introduced 
into plants suggests that they might exchange naturally occurring small 
RNAs with parasites. Consistent with this hypothesis, small RNAs from 
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Figure 1 | C. campestris miRNAs induced at the haustorial interface. 
a, Mean abundance plot of C. campestris small-RNA loci comparing 
interface (I) to parasite stem (PS). Significantly upregulated (Up) loci are 
highlighted (alternative hypothesis: true difference > 2-fold, FDR < 0.05 
after Benjamini-Hochberg correction for multiple testing). nt, nucleotide. 
b, Predicted secondary structures of induced C. campestris miRNA hairpin 
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the plant pathogenic fungus Botrytis cinerea target host mRNAs dur- 
ing infection’, and HIGS targeting of dicer-like mRNAs in B. cinerea 
reduces pathogen virulence!”. Conversely, host miRNAs are exported 
from cotton into the fungal pathogen Verticillium dahliae!'. However, 
to our knowledge, no examples of naturally occurring trans-species 
miRNAs have been described for plant—plant interactions. 

Cuscuta haustoria facilitate bidirectional movement of viruses, 
proteins, and mRNAs|, but the functional effects of these move- 
ments are unclear. Cuscuta is susceptible to HIGS, so we hypothesized 
that naturally occurring small RNAs might be exchanged across the 
C. campestris haustorium and affect gene expression in the recipient 
species. We profiled small-RNA expression in C. campestris grown 
on A. thaliana hosts using high-throughput small-RNA sequencing 
(small-RNA-seq). Two biological replicates each from three tissues 
were analysed: parasite stem, comprising a section of C. campestris 
stem above the site of haustorium formation; interface, comprising 
C. campestris stem with haustoria with associated A. thaliana stem 
tissue; and host stem, comprising sections of A. thaliana stems above 
the interface region, as previously described’*. Small-RNA-producing 
loci from both organisms were identified, classified, and subjected to 
differential-expression analyses (Supplementary Data 1). 

As expected, owing to dilution of parasite RNA with host RNA, 
C. campestris small-RNA loci were generally downregulated in 
the interface relative to the parasite stem (Fig. 1a). However, 76 
C. campestris small-RNA species were significantly upregulated in 
the interface relative to the parasite stem (false discovery rate (FDR) 
<0.05). Of these interface-enriched species, 43 (57%) were miRNA 
species with canonical accumulation of discrete miRNA-miRNA* pairs 
(the expected processing intermediates of miRNA biogenesis) from 
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precursors with colour-coded small-RNA-seq coverage per nucleotide. 
c, RNA blots of 22-nucleotide interface-induced miRNAs. HS, host 
stem; CS, control stem. U6, small nuclear RNA loading control. The 
experiment was performed twice with similar results. Full gels are shown 
in Supplementary Fig. 1. 


lHuck Institutes of the Life Sciences, The Pennsylvania State University, University Park, Pennsylvania 16802, USA. *Department of Biology, The Pennsylvania State University, University Park, 
Pennsylvania 16802, USA. Department of Plant Pathology, Physiology and Weed Science, Virginia Polytechnic Institute and State University, Blacksburg, Virginia 24061, USA. “Knox College, 
Galesburg, Illinois 61401, USA. +Present addresses: Department of Biology, Indiana University, Bloomington, Indiana 47405, USA (F.W.); Salk Institute for Biological Studies, La Jolla, California 


92037, USA (C.C)). 
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predicted stem-loop precursors (Fig. 1b, Supplementary Data 2-4). 
RNA blots confirmed interface-specific expression of specific miRNAs 
(Fig. 1c). One of the 43 miRNAs is a member of the conserved MIR164 
family; the other 42 upregulated miRNAs have low sequence similarity 
to known miRNA loci, and none of the mature miRNAs or miRNA* 
aligned perfectly with the A. thaliana genome (Supplementary Data 5). 
Several of the miRNA loci were detected by PCR of C. campestris 
genomic DNA prepared from four-day old seedlings that had never 
interacted with a host plant (Extended Data Fig. 1). The majority of the 
induced C. campestris miRNAs (26 out of 43) produced a 22-nucleotide 
mature miRNA. Such 22-nucleotide miRNAs occur less frequently than 
21-nucleotide miRNAs in plants, and they are strongly associated with 
accumulation of secondary siRNA from their targets’*!*. Secondary 
siRNAs are thought to amplify miRNA-directed gene silencing”. 

We hypothesized that the induced 22-nucleotide miRNAs would 
cause formation of secondary siRNA from targeted host mRNAs. 
Therefore, we searched small-RNA-seq data for A. thaliana mRNAs 
that both contained plausible miRNA-complementary sites and shared 
sequences with siRNAs that accumulated specifically at the interface. 
Six A. thaliana mRNAs were found that met both criteria: TIR1, AFB2 
and AFB3, which encode partially redundant auxin receptors!> ; BIK1, 
which encodes a plasma-membrane-localized kinase required for 
both pathogen-induced and developmental signalling’®'’; SEOR1, 
which encodes an abundant phloem protein that reduces photosyn- 
thate loss from the phloem after injury'*!°; and HSFB4 (also known 
as SCZ), which encodes a predicted transcriptional repressor that is 
required for the formation of ground-tissue stem cells in roots*”-**. 
The siRNAs produced from these mRNAs resembled other examples 
of secondary siRNAs in their size distributions, double-stranded accu- 
mulation, and phasing (Fig. 2a, b, Extended Data Fig. 2). TIR1, AFB2 
and AFB3 are also known to be targeted by the 22-nucleotide miRNA 
miR393, and to produce secondary siRNAs downstream of the miR393- 
complementary site”’. In parasitized stems, the location and phase 
register of the TIR1, AFB2 and AFB3 secondary siRNAs shift upstream, 
proximal to the sites that are complementary to the C. campestris 
miRNAs (Extended Data Fig. 2), implying that the C. campestris 
miRNAs, and not miR393, are triggering the interface-specific 
secondary siRNAs. The predominant 21-nucleotide phase register at 
several loci was shifted by +1 to +2 nucleotides relative to predictions. 
This is consistent with the ‘phase drift’ seen at other phased siRNA 
loci?*?> that cause the register to be shifted forward, and is probably 
due to the presence of low levels of 22-nucleotide siRNAs. Analysis 
of uncapped mRNA fragments showed strong evidence for miRNA- 
directed cleavage at all of the sites complementary to C. campestris 
miRNAs, specifically those from interface samples but not from con- 
trol stem samples (Fig. 2, Extended Data Fig. 2). We did not find any 
induced miRNAs or siRNAs from the A. thaliana host that were capable 
of targeting these six mRNAs. We also did not find any endogenous C. 
campestris secondary siRNA loci corresponding to any of the induced 
miRNAs. Some C. campestris orthologues of TIR1, HSFB4 and BIK1 
had possible, but very poorly complementary, miRNA target sites 
(Extended Data Fig. 3). These observations suggest that the induced 
C. campestris miRNAs have evolved to avoid targeting ‘self’ transcripts. 
We conclude that 22-nucleotide miRNAs from C. campestris act in a 
trans-species manner to target A. thaliana mRNAs. 

Accumulation of five of the six targets was significantly reduced 
in parasitized stems compared to control stems (Fig. 3a). The true 
magnitude of repression of these targets could be even greater, since 
many miRNAs also direct translational repression. Accumulation of 
A. thaliana secondary siRNAs is partially dependent on the endonu- 
clease DCL4 (DICER-LIKE 4) and wholly dependent on RDR6 (RNA- 
DEPENDENT RNA POLYMERASE 6, also known as SGS2 or SDE1)’. 
Accumulation of an abundant secondary siRNA from TIR1 was elimi- 
nated entirely in the sgs2-1 mutant, and reduced in the dcl4-2t mutant 
(Fig. 3b). Thus, host DCL4 and RDR6 are required for secondary siRNA 
production. This implies that the C. campestris-derived miRNAs are 
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Figure 2 | C. campestris miRNAs cause slicing and phased siRNA 
production from host mRNAs. a, Small-RNA-seq coverage for A. thaliana 
SEOR1 for host stem, interface and parasite stem. n = 2 biologically 
independent samples each; miRNA-complementarity is shown by RNA- 
ligase-mediated 5’-rapid amplification of cDNA ends (5’-RLM-RACE) and 
expected phase register; r.p.m., reads per million. b, Length and polarity 
distribution of SEOR1-mapped siRNAs from interface samples. 

c, Radar chart showing fraction of interface-derived siRNAs in each 
possible 21-nucleotide phasing register. d, 5/-RLM-RACE products from 
nested amplifications. ARF 17, positive control. The experiment was 
performed once. Full gels are shown in Supplementary Fig. 1. 


active inside host cells and hijack the host’s own silencing machinery 
to produce secondary siRNAs. 

In repeated trials, we did not observe consistent significant differ- 
ences in parasite fresh weight using dcl4-2t and sgs2-1 mutants as hosts 
(Extended Data Fig. 4). Thus, loss of induced secondary siRNAs is not 
sufficient to affect parasite growth in this assay. Similarly, there were 
no significant differences in biomass of C. campestris grown on scz2 
or tirl-1/afb2-3-mutant hosts (Fig. 3c). Significantly less (P < 0.05) 
C. campestris biomass was observed using the bik] mutant as host 
(Fig. 3c). However, interpretation of this result was complicated by the 
weak, frequently lodging stems of the bik! mutant'®. BIK1 is involved 
in both plant development and immunity, and its developmental func- 
tions may mask its role in the C. campestris interaction. Significantly 
more (P< 0.05) C. campestris biomass was observed on seor1 or afb3-4 
mutants (Fig. 3c). Therefore, both SEOR1 and AFB3 function to restrict 
C. campestris growth on A. thaliana. This observation is consistent with 
the hypothesis that both SEOR1 and AFB3 are trans-species miRNA 
targets of biological relevance in A. thaliana. 

C. campestris has a broad host range among eudicots”’. Therefore, 
we searched for sites in eudicot orthologues of the targeted A. thaliana 
mRNAs that were complementary to the C. campestris miRNAs 
induced specifically at the interface. Probable orthologues of BIK1, 
SEOR1, TIR1 and HSFB4 were identified as predicted targets of 
interface-induced miRNAs in many eudicot species, while only one 
species had predicted targets in the orthologues of the negative control 
GAPDH (Fig. 4a, Extended Data Table 1). We conclude that the induced 
C. campestris miRNAs would be able to collectively target TIR1, SEOR1, 
HSFB4 and BIK1 orthologues in many eudicot species. 
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Figure 3 | Effects of C. campestris miRNAs and their targets. 

a, Accumulation of A. thaliana mRNA in interface versus control 

stems, shown by quantitative reverse-transcriptase polymerase chain 
reaction (qRT-PCR). Control stems, n = 8; interface, n =7 biologically 
independent samples. Box plots show the median, box edges represent 
the first and third quartiles, and the whiskers extend to 1.5x interquartile 
range. P values are displayed above the x axis; Wilcoxon rank-sum tests, 


We performed additional small-RNA-seq from C. campestris on 
A. thaliana hosts, and from C. campestris on N. benthamiana hosts. 
Both sets of experiments were designed identically to the original 
small-RNA-seq study (two biological replicates each of host stem, 
interface and parasite stem samples). The interface-induced set of 
C. campestris miRNA loci was highly reproducible across both of the 
A. thaliana experiments as well as the N. benthamiana experiment 
(Extended Data Fig. 5). Induction of several C. campestris miRNAs 
during N. benthamiana parasitism was confirmed by RNA blots 
(Fig. 4b). Several N. benthamiana mRNAs both contained plausible 
target sites for C. campestris miRNAs and showed accumulation 
of phased, secondary siRNAs in the interface samples, including 
orthologues of TIRI and BIK1 (Extended Data Fig. 6). Analysis of 
uncapped RNA ends provided strong evidence for miRNA-directed 
cleavage of one of the N. benthamiana TIR1 orthologues (Fig. 4c, d). 
This is direct evidence that the same C. campestris miRNAs target 
orthologous host mRNAs in multiple species. None of the interface- 
induced miRNAs we tested were detectable in C. campestris pre-haustoria 
from seedling tips that had coiled around dead bamboo stakes instead 
ofa live host (Fig. 4b, Extended Data Fig. 7). This suggests that contact 
with a living host is a requirement for expression of these miRNAs. 
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unpaired, one-tailed. AT4G34270 (also known as TIP41-like protein), 
control. b, RNA blots from C. campestris infestations of the indicated 

A. thaliana genotypes. Full gels are shown in Supplementary Fig. 1. The 
experiment was performed twice with similar results. c, C. campestris 
biomass on A. thaliana hosts of the indicated genotypes. P values and 
plotting conventions as in a, except two-tailed tests were used; n= 11, 8, 
11, 10, 14, and 7 biologically independent samples (left to right). 


These data demonstrate that C. campestris induces a large number 
of miRNAs at the haustorium, and that some of these miRNAs 
target and reduce accumulation of host mRNAs. Many of the 
induced miRNAs are 22 nucleotides in length, and are associated 
with secondary siRNA production from their host targets using the 
host’s secondary siRNA machinery. Several of the targets are linked 
to plant pathogenesis: manipulation of levels of TIR1, AFB2, and 
AFB3 mRNA affects bacterial pathogenesis and defence signalling”’, 
and BIK1 is a central regulator of pathogen-induced signalling”®. 
Perhaps the most intriguing target is SEOR1, which encodes a very 
abundant protein that is present in large agglomerations in phloem 
sieve-tube elements'®. seor] mutants show an increased loss of 
sugars from detached leaves!?, and our data show that seor] mutants 
also support increased C. campestris growth. A key function of the 
haustorium is to capture nutrients from the host phloem; targeting 
SEORI could bea strategy to increase sugar uptake from host phloem. 
Overall, these data suggest that C. campestris trans-species miRNAs 
might function as virulence factors to remodel host gene expression 
to the parasite’s advantage. Further experiments that directly disrupt 
the delivery or function of these miRNAs will be needed to test this 
hypothesis directly. 
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Figure 4 | Conservation of host mRNA targeting by C. campestris. 

a, Predicted targets of interface-induced C. campestris miRNA-miRNA*. 

NA, no orthologous genes found. b, RNA blots from interface and control 
stem of C. campestris-infested N. benthamiana (Nb), A. thaliana (At), and 
C. campestris pre-haustoria (PH). The experiment was performed once. 


84 | NATURE | VOL 553 | 4 JANUARY 2018 


c, 5/-RLM-RACE products for the indicated N. benthamiana cDNAs. 
ARF, positive control. The image was cropped to remove irrelevant 

lanes. Full gels are shown in Supplementary Fig. 1. The experiment was 
performed once. d, Complementary site and 5‘-RLM-RACE results from a 
N. benthamiana TIR1 orthologue. 
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METHODS 


Cuscuta was initially obtained from a tomato field in California, and seed stocks 
were derived from self-pollination through several generations in the Westwood 
laboratory. The isolate was initially previously identified as Cuscuta pentagona. 
C. pentagona is very closely related to C. campestris, and the two are distinguished 
by microscopic differences in floral morphology; because of this they have 
often been confused”*. We subsequently determined that our isolate is indeed 
C. campestris. A. thaliana sgs2-1 mutants*® were a gift from H. Vaucheret (INRA 
Versailles). A. thaliana dcl4-2t mutants (GABI_160G05*") were obtained from the 
Arabidopsis Biological Resource Center (Ohio State University). A. thaliana seor 
mutants (GABI-KAT 609F04'*) were a gift from M. Knoblauch (Washington State 
University). The A. thaliana tirl-1/afb2- and afb3-4 mutants” were a gift from 
G. Monshausen (Pennsylvania State University). The bik! mutant'® was a gift from 
T. Mengiste (Purdue University). The scz2 mutant” was a gift from R. Heidstra 
(Wageningen University). All A. thaliana mutants were on the Col-0 background. 
Growth conditions and RNA extractions. For initial experiments (small-RNA- 
seq and RNA blots in Fig. 1), A. thaliana (Col-0) plants were grown in a growth 
room at 18-20°C with 12h light per day, illuminated (200,1mol m~’s~!) with 
metal-halide (400 W, GE multi-vapour lamp) and spot-gro (65 W, Sylvania) lamps. 
C. campestris seeds were scarified in concentrated sulfuric acid for 45 min, rinsed 
5-6 times with distilled water and dried. The seeds were placed in potting medium 
at the base of four-week-old A. thaliana seedlings and allowed to germinate and 
attach to hosts. The C. campestris plants were allowed to grow and spread on host 
plants for an additional three weeks to generate a supply of uniform shoots for use 
in the experiment. Sections of C. campestris shoot tip (~10.cm long) were placed 
on the floral stems of a fresh set of A. thaliana plants. Parasite shoots coiled around 
the host stems and formed haustorial connections. Tissues from plants that had 
established C. campestris with at least two coils around healthy host stems and 
clear parasite growth were used in these studies. Control plants were grown under 
the same conditions as parasitized plants, but were not exposed to C. campestris. 

For the preparation of tissue-specific small-RNA libraries, tissues were 
harvested after C. campestris cuttings had formed active haustorial connections to 
the host. This was evidenced by growth of the C. campestris shoot to a length of at 
least 10cm beyond the region of host attachment (7-10 days after infection). Three 
tissues were harvested from the A. thaliana-C. campestris associations: 2.5 cm of 
A. thaliana stem above the region of attachment, A. thaliana and C. campestris 
stems in the region of attachment (referred to as the interface), 2.5 cm of the 
parasite stem near the point of attachment. To remove any possible cross- 
contamination between A. thaliana and C. campestris, harvested regions of the 
parasite and host stem were taken 1 cm away from the interface region and the 
surface of each harvested tissue cleaned by immersion for 5 min in 70% ethanol, 
the ethanol was decanted and replaced, the process was repeated three times and 
the stems were blotted dry with a Kimwipe after the final rinse. All three sections 
of tissue were harvested at the same time, and material from 20 attachments was 
pooled for small-RNA extraction. Small RNA was extracted from ~100 mg of each 
tissue using the mirPremier microRNA Isolation Kit (Sigma-Aldrich) according 
to the manufacturer's protocol. Small RNA was analysed using an small-RNA kit 
(Agilent) on a 2100 Bioanalyzer platform. 

Samples used for 5/-RLM-RACE (Fig. 2d) and qRT-PCR (Fig. 3a) analyses of 
A. thaliana targets were prepared as described above with the following 
modifications: Col-0 A. thaliana hosts were cultivated in a growth room with 16-h 
days, 8-h nights, at ~23 °C under cool-white-fluorescent lamps. Attachment of 
C. campestris cuttings was promoted by illumination with far-red LED lighting for 
3-5 days, and total RNA was extracted using Tri-reagent (Sigma) per the manu- 
facturer’s suggestions, followed by a second sodium-acetate-ethanol precipitation 
and wash step. Samples used for RNA blots of secondary siRNA accumulation 
from A. thaliana mutants and replicate small-RNA-seq libraries were obtained 
similarly, except that the samples were derived from the primary attachments of 
C. campestris seedlings on the hosts instead of from cuttings. In these experiments, 
scarified C. campestris seedlings were first germinated on moistened paper towels 
for three days at ~28 °C, then placed adjacent to the host plants with their radicles 
submerged in a water-filled 0.125-ml tube. 

C. campestris pre-haustoria (Extended Data Fig. 7) were obtained by scarifying, 
germinating and placing seedlings as described above, next to bamboo stakes in 
soil, under illumination from cool-white fluorescent lights and far-red-emitting 
LEDs. Seedlings coiled and produced pre-haustoria four days after being placed, 
and were harvested and used for total-RNA extraction (used for RNA blots in 
Fig. 4b) using Tri-reagent as described above. N. benthamiana was grown in a 
growth room with 16-h days, 8-h nights, at ~23°C, under cool-white fluorescent 
lamps. Three-to-four-week-old plants served as hosts for scarified and germinated 
C. campestris seedlings. Attachments were promoted by three-to-six days with 
supplementation by far-red-emitting LEDs. Under these conditions, C. campestris 
attached to the petioles, and not the stems, of the N. benthamiana hosts. Interfaces 


and control petioles from un-parasitized hosts were collected seven-to-eight days 
after successful attachments, and total RNA (used for RNA blots in Fig. 4b and 
small-RNA-seq libraries) was recovered using Tri-reagent as described above. 
Small-RNA-seq. The initial small-RNA-seq libraries were constructed using the 
Tru-Seq small-RNA kit (Illumina) per the manufacturer's protocol and sequenced 
on an HiSeq2500 instrument (Illumina). Subsequent small-RNA-seq libraries 
(replicate two using A. thaliana hosts, and the N. benthamiana experiments) 
used the NEBnext small-RNA library kit (New England Biolabs), following the 
manufacturer's instructions. Raw small-RNA-seq reads were trimmed to remove 
3'-adapters and filtered for quality and trimmed length >16 nucleotides using 
cutadapt* version 1.9.1 with the settings “-a TGGAATTCTCGGGTGCCAAGG- 
discard-untrimmed -m 16-max-n = 0”. For experiments where A. thaliana was 
the host, trimmed reads that aligned with zero or one mismatch (using bowtie™* 
version 1.1.2, settings “-v 1”) to the A. thaliana plastid genome, the Cuscuta 
gronovii plastid genome (C. gronovii was the closest relative to C. campestris for 
which a completed plastid-genome assembly was publicly available), A. thaliana 
rRNAs, tRNAs, small nuclear RNAs (snRNAs), and small nucleolar RNAs (snoR- 
NAs) were removed. Similarly, for experiments where N. benthamiana was the 
host, the reads were cleaned against the C. gronovii plastid genome, the N. tabacum 
plastid genome and rRNAs, and a set of tRNAs predicted from the N. benthamiana 
genome using tRNAscanSE. 

For the original A. thaliana host data, the clean reads were aligned and analysed 
with reference to the combined TAIRI1O A. thaliana reference genome and a 
preliminary version 0.1 draft genome assembly of C. campestris using ShortStack*? 
(version 3.8.3) with default settings. The resulting annotated small-RNA loci 
(Supplementary Data 1) were analysed for differential expression (interface 
versus parasite stem) using DESeq2**, with a log, fold threshold of 1, alternative 
hypothesis of ‘greaterAbs, and alpha of 0.05. P values were adjusted for multiple 
testing using the Benjamini-Hochberg procedure, and loci with an adjusted 
P value of <0.05 (equivalent to an FDR of <0.05) were denoted upregulated in 
interfaces relative to parasite stem. Among the upregulated loci, those annotated 
by ShortStack as miRNAs deriving from the C. campestris genome which produced 
either a 21- or 22-nucleotide mature miRNA (Supplementary Data 2) were retained 
and further analysed. The predicted secondary structures and observed small- 
RNA-seq read coverage was visualized (Supplementary Data 3, 4) using strucVis 
(version 0.3; https://github.com/MikeAxtell/strucVis). 

For analysis of mRNA-derived secondary siRNAs, the clean small-RNA-seq 
reads from the original A. thaliana experiment were aligned to the combined 
TAIR1O representative cDNAs from A. thaliana and our preliminary version 
0.1 transcriptome assembly for C. campestris, using ShortStack*° v3.8.3, with the 
settings -mismatches 0,-nohp, and defining the full length of each mRNA as a 
locus using the option —locifile. The resulting counts of small-RNA alignments for 
each mRNA were used for differential-expression analysis, comparing interface to 
host stem, using DESeq2** as described above. A. thaliana mRNAs with signifi- 
cantly upregulated (FDR <0.05) small RNAs in the interface compared to host stem 
were retained for further analysis. The CDNA sequences of these loci were retrieved 
and used for miRNA target predictions using GSTAr (v1.0; https://github.com/ 
MikeAxtell/GSTAr); the full set of mature miRNA and miRNA* (Supplementary 
Data 2) from the interface-induced C. campestris miRNA loci were used as queries. 

Analysis of the second set of A. thaliana-C. campestris small-RNA-seq data 
aligned the cleaned reads to the combined A. thaliana and C. campestris reference 
genomes as described above, except that the list of loci derived in the analysis of 
the original data (Supplementary Data 1) was used as a -locifile in the ShortStack 
analysis. Differential-expression analysis was then performed using DESeq2 as 
described above. Analysis of the N. benthamiana-C. campestris small-RNA-seq 
data began with a ShortStack analysis of the cleaned reads against the combined 
N. benthamiana (v0.4.4) genome and the preliminary assembly of the C. campestris 
genome, using default settings. The de novo N. benthamiana loci obtained from 
this run were retained. The resulting alignments were used to quantify abundance 
of small RNAs from the C. campestris small-RNA loci defined with the original 
data. The resulting read counts were then used for differential-expression analysis 
with DESeq2 as described above. Analysis of secondary siRNAs derived from 
N. benthamiana mRNAs was performed in a similar way to the A. thaliana mRNA 
analysis described above, except that the combined transcriptomes were from 
C. campestris and N. benthamiana (v0.4.4 annotations). 

RNA blots. Small RNA gel blots were performed as previously described*” with 
modifications. For the blots shown in Fig. 1b, 1.8 1g of small RNA from each 
sample was separated on 15% TBE-Urea Precast gels (Bio-Rad), transblotted onto 
the Hybond NX membrane and cross-linked using 1-ethyl-3-(3-dimethylamino- 
propyl) carbodiimide**. Hybridization was carried out in 5x SSC, 2x Denhardt’s 
solution, 20mM sodium phosphate (pH 7.2), 7% SDS with 100,.g ml“! salmon 
testes DNA (Sigma-Aldrich). Probe labelling, hybridization and washing were 
performed as described*’. Radioactive signals were detected using Typhoon FLA 


© 2018 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. 


7000 (GE Healthcare). Membranes were stripped between hybridizations by 
washing with 1% SDS for 15 min at 80°C and exposed for at least 24h to verify 
complete removal of probe before re-hybridization. Blots in Figs 3b and 4b were 
performed similarly, except that 12 j1g of total RNA was used. Probe sequences are 
listed in Supplementary Data 6. 
5’-RNA ligase-mediated rapid amplification of cDNA ends. Five micrograms 
total RNA was ligated to 1 jg of a 44-nucleotide RNA adaptor (Supplementary 
Data 6) using a 2011 T4 RNA ligase 1 reaction (NEB) per the manufacturer’s 
instructions for a 1 h incubation at 37°C. The reaction was then diluted with 68 jul 
water and 2110.5 M EDTA pH 8.0, and incubated at 65°C for 15 min to inactivate 
the ligase. Sodium acetate pH 5.2 was added to a final concentration of 0.3 M, 
and the RNA was precipitated with ethanol. The precipitated and washed RNA 
was resuspended in 10 11 water; 3.33 11 of this sample was used as template in a 
reverse transcription reaction using random primers and Protoscript II reverse 
transcriptase (NEB) per the manufacturer’s instructions. The resulting cDNA was 
used as template in first round PCR using a 5’ primer matching the RNA adaptor 
and a 3’ gene-specific primer (Supplementary Data 6); 1 jl of the product was 
used as template for nested PCR with nested primers (Supplementary Data 6). 
Gene-specific primers for A. thaliana cDNAs were based on the representative 
TAIR10 transcript models, while those for N. benthamiana cDNAs were based 
on the v0.4.4 transcripts (Sol Genomics Network*’). In Fig. 4c, N. benthamiana 
TIR1 is transcript ID NbS00011315g0112.1; N. benthamiana ARF is transcript 
ID NbS00059497g0003.1. Bands were purified from agarose gels and cloned into 
pCR4-TOPO (Life Tech). Inserts from individual clones were recovered by colony 
PCR and analysed by Sanger sequencing. 
Quantitative reverse-transcription-PCR. Total RNA used for RT-PCR was 
first treated with DNasel (RNase-free; NEB) per the manufacturer's instructions, 
ethanol precipitated and resuspended. The treated total RNA (21g) was used for 
cDNA synthesis using the High Capacity cDNA Synthesis Kit (Applied Biosystems) 
per the manufacturer's instructions. PCR reactions used PerfeCTa SYBR Green 
FastMix (Quantabio) on a StepONE-Plus quantitative PCR system (Applied 
Biosystems) per the manufacturer’s instructions. Primers (Supplementary Data 
6) were designed to span the miRNA target sites to ensure that only uncleaved 
mRNAs were measured. Three reference mRNAs were used: ACT2, AT1G13320 
(which encodes PDF2, a subunit of PP2A), and AT4G34270"". Raw C, values were 
used to calculate relative normalized expression values to each reference mRNA 
separately, and the final analysis used the median relative expression values 
between the ACT12 and AT4G34270-normalized data. 
C. campestris growth assays. C. campestris seedlings were scarified, pre- 
germinated, and placed next to hosts in 0.125 ml water-filled tubes under cool- 
white fluorescent lighting supplemented with far-red-emitting LEDs (16-h day, 8-h 
night) at ~23°C as described above. After a single attachment formed (four days), 
far-red light supplementation was removed to prevent secondary attachments. 
After 18 more days of growth, entire C. campestris vines were removed and weighed 
(Fig. 3c). Multiple additional growth trials were performed specifically on the 
dcl4-2t and sgs2-1 mutant hosts under varying conditions (Extended Data Fig. 4). 
miRNA target predictions. To find probable orthologues for A. thaliana 
genes of interest, the A. thaliana protein sequences were used as queries for a 
BLASTP analysis of the 31 eudicot proteomes available on Phytozome 11 (https:// 
phytozome.jgi.doe.gov/pz/portal.html#). Transcript sequences for the top 100 hits 
were retrieved. In some cases no hits were found in a particular species; these 
are shown as ‘NA in Fig. 4a. The miRNA query set was all mature miRNA and 
miRNA* from the interface-induced, C. campestris-derived 21- or 22-nucleotide 
miRNAs (Supplementary Data 2). Probable targets from the 31 species were 
identified as those having a score of up to 4.5 using targetfinder.pl v0.1 (https:// 
github.com/MikeAxtell/TargetFinder/). 

N. benthamiana orthologues of A. thaliana proteins were found based on 
BLASTP searches against the v0.4.4 N. benthamiana protein models at Sol 
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Genomics Network*’, and miRNA target sites predicted using targetfinder.pl as 
above. 

Statistics and reproducibility. No statistical methods were used to predetermine 
sample size. The experiments were not randomized. 95% confidence intervals from 
Fig. 3a: 0.249 to 0.611 (BIK1), 0.267 to 0.781 (SEORI), —0.122 to 0.649 (HSFB4), 
0.385 to 0.894 (TIR1), 0.083 to 0.724 (AFB2), 0.071 to 0.678 (AFB3), —0.461 to 
—0.120 (AT4G34270). Note that these confidence intervals from unpaired 
Wilcoxon rank-sum tests are the estimators of the median of control stem minus 
interface for each gene. 95% confidence intervals from Fig. 3c: —0.580 to —0.200 
(seorl), 0.220 to 0.400 (bik1), —0.070 to 0.150 (scz2), —0.170 to 0.060 (tir1-1/afb2-3), 
—0.440 to —0.180 (afb3-4). Note that these confidence intervals from unpaired 
Wilcoxon rank-sum tests are the estimators of the median of Col-0 minus the 
mutant for each comparison. 

Code availability. ShortStack*® (small-RNA-seq analysis), strucVis (visualiza- 
tion of predicted RNA secondary structures with overlaid small-RNA-seq depths), 
GSTArpl (prediction of miRNA targets) and Shuffler.pl/targetfinder.pl (prediction 
of miRNA targets controlling for false discovery rate) are all freely available at 
https://github.com/MikeAxtell. Cutadapt* is freely available at http://cutadapt. 
readthedocs.io/en/stable/index.html. The R package DESeq2** is freely available 
at http://www. bioconductor.org/packages/release/bioc/html/DESeq2.html. 
Data availability. Small-RNA-seq data from this work are available at NCBI 
GEO under accession GSE84955 and NCBI SRA under project PRJNA408115. 
The draft, preliminary C. campestris genome and transcriptome assemblies 
used in this study are available at the Parasitic Plant Genome Project website 
at http://ppgp.huck.psu.edu. C. campestris miRNA loci have been registered 
with miRBase. Source data availability: Fig. 1b, in Supplementary Data 2 and 3; 
Figs Ic, 2c, d, 3b, 4b and c, in Supplementary Fig. 1; Fig. 3a, c and Extended Data 
Fig. 4, included as Source Data; Fig. 4a, in Extended Data Table 1. There are no 
restrictions on data availability and the corresponding author will provide any 
data not already included as Supplementary Data or as Source Data upon request. 
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Extended Data Figure 1 | PCR of C. campestris miRNA loci. Genomic DNA isolated from C. campestris seedlings four days after germination was used 
as template; the seedlings had never attached to nor been near a host plant, ruling out host DNA contamination. trnL-F, positive control plastid locus. 
Experiment performed once. 
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Extended Data Figure 2 | C. campestris miRNAs cause slicing and clones with 5’-ends at the indicated positions; the locations in red are the 
phased siRNA production from host mRNAs. Small-RNA-seq coverage predicted sites for miRNA-directed slicing remnants. Bar charts show 
across the indicated A. thaliana transcripts are shown in blue for host the length and polarity distribution of transcript-mapped siRNAs. Radar 
stem, interface, and parasite stem samples. For display, the two biological charts show the fractions of siRNAs in each of the 21 possible phasing 
replicates of each type were merged. Red marks and vertical lines show registers; the registers highlighted in magenta are those predicted by the 
positions of complementary sites to C. campestris miRNAs, with the miRNA target sites. 


alignments shown above. Fractions indicate numbers of 5‘-RLM-RACE 
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Extended Data Figure 3 | Possible miRNA target sites within endogenous C. campestris mRNAs. Note that none of these mRNAs showed evidence of 
secondary siRNA accumulation, and the complementarity of these sites was generally poor. 
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Extended Data Figure 4 | Growth of C. campestris on A. thaliana sgs2-1 95% confidence interval (Col-0 minus dcl4-2t), —0.052 to 0.012. 


and dcl4-2t mutants with varying methodologies, as indicated. 


c, n=11 and 10 biologically independent samples for Col-0 and dcl4-2t, 


a-d, P values (Wilcoxon rank-sum tests, unpaired, two-tailed) from respectively. 95% confidence interval (Col-0 minus dcl4-2t), —0.014 to 
comparison of mutant to wild-type (Col-0) are shown. Box plots show 0.018. d, n=8 and 9 biologically independent samples for Col-0 and 
the median, box edges represent the first and third quartiles, the whiskers dcl4-2t, respectively. 95% confidence interval (Col-0 minus dcl4-2t), 
extend to 1.5x interquartile range, and all data are shown as dots.a,n=16 —0.184 to 0.008. e, n = 14, 14 and 12 biologically independent samples 


and 9 biologically independent samples for Col-0 and sgs2-1, respectively. for Col-0, sgs2-1, and dcl4-2t, respectively. 95% confidence interval 
95% confidence interval (Col-0 minus sgs2-1), 0.120 to 0.000.b, n=10and = (Col-0 minus sgs2-1), —0.020 to 0.090. 95% confidence interval 
8 biologically independent samples for Col-0 and dcl4-2t, respectively. (Col-0 minus dcl4-2t), —0.010 to 0.110. 
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Extended Data Figure 5 | Highly reproducible induction of 

C. campestris miRNAs in different hosts. a, Mean abundance plot 

from original experiment on A. thaliana hosts of C. campestris small- 

RNA loci comparing interface to parasite stem samples. Significantly 
upregulated loci are highlighted (alternative hypothesis: true difference 
>2-fold, FDR < 0.05 after correction for multiple testing with the 
Benjamini-Hochberg procedure). Reproduced from Fig. 1a. b, c, As 

a, except for a new set of A. thaliana hosts (b) or from an experiment using 


Original cca x ath 


MIR-Cluster_102537 (ccm-MIR12480)(SEOR1) 
MIR-Cluster_57001 (com-MIR12494b)(SCZ/HSFB4) 
”"" MIR-Cluster_115407 (com-MIR12497a)(TIR/AFB) 
MIR-Cluster_67631 (com-MIR12497b)(TIR/AFB) 
MIR-Cluster_54651 (com-MIR12495)(BIK1-3p) 


MIR-Cluster_105391 (ccm-MIR12463b)(BIK1-5p) 


N. benthamiana as hosts (c). Significantly upregulated loci are highlighted 
(alternative hypothesis: true difference > 2-fold, FDR < 0.05 after 
correction for multiple testing with the Benjamini-Hochberg procedure). 
d, Area-proportional Euler diagram showing overlaps of upregulated 

C. campestris 21-22-nucleotide miRNA loci among the three small-RNA- 
seq experiments. The locations of the six miRNA loci of special interest are 
highlighted in green. 
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Extended Data Figure 6 | C. campestris miRNAs cause slicing and 
phased siRNA production from N.benthamiana mRNAs. Small-RNA- 
seq coverage across the indicated N. benthamiana transcripts are shown 

in blue for host stem, interface, and parasite stem samples. For display, the 
two biological replicates of each type were merged. Red marks and 

vertical lines show position of complementary sites to C. campestris 
miRNAs, with the alignments shown above. Fraction indicates numbers of 
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5'-RLM-RACE clones with 5/-ends at the indicated positions; the locations 
in red are the predicted sites for miRNA-directed slicing remnants. Bar 
charts show the length and polarity distribution of transcript-mapped 
siRNAs. Radar charts show the fractions of siRNAs in each of the 21 
possible phasing registers; the registers highlighted in magenta are those 
predicted by the miRNA target sites. 


© 2018 Macmillan Publishers Limited, part of Springer Nature. All rights reserved 


LETTER 


and then placed next to a bamboo stake for four days with far-red LED 
lighting. Approximately 30 such seedlings were used for the pre-haustoria 
from the stake to show the prominent pre-haustorial bumps. Seedling RNA in Fig. 4b. Scale bars, 1 mm. 

was scarified, germinated on moist paper towels for three days at ~28 °C, 


Extended Data Figure 7 | C. campestris pre-haustoria. a, C. campestris 
seedling wound around a bamboo stake. b, The same seedling, removed 
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Extended Data Table 1 | Predicted miRNA targets multiple plant species 


Phytozome 11 species code Species BIK1 SEOR1 SCZ/HSFB4 TIR1 GAPDH 
Org_Acoerulea Auilegia coerulea 0 4 0 1 0 
Org_Alyrata Arabidopsis lyrata 1 1 0 1 0 
Org_Athaliana Arabidopsis thaliana 1 1 | 1 0 
Org_BrapaFPsc Brassica rapa 1 0 1 1 0 
Org_Bstricta Boechera stricta 1 0 0 1 0 
Org_Cclementina Citrus clementina 0 i 0 0 0 
Org_Cgrandiflora Capsella grandiflora 1 0 0 1 0 
Org_Cpapaya Carica papaya 1 0 1 a 0 
Org_Crubella Capsella rubella 1 0 0 1 0 
Org_Csativus Cucumus sativus 0 0 0 0 0 
Org_Csinensis Citrus sinensis 1 1 0 A 0 
Org_Egrandis Eucalyptus grandis 0 1 1 1 0 
Org_Esalsugineum Eutrema salsugineum i 0 0 1 0 
Org_Fvesca Fragaria vesca 0 NA NA 1 1 
Org_Gmax Glycine max 1 1 1 1 0 
Org_Graimondii Gossypium raimondii 1 0 1 1 0 
Org_Kmarnieriana Kalanchoe marnieriana 0 1 1 1 0 
Org_Lusitatissimum Linum usitatissimum 0 1 0 0 0 
Org_Mdomestica Malus domestica 0 NA 0 1 0 
Org_Mesculenta Manihot esculenta 1 1 0 1 0 
Org_Mguttatus Mimulus gattus 1 NA 0 1 0 
Org_Mtruncatula Medicago truncatula 1 1 0 1 0 
Org_Ppersica Prunus persica 0 NA 0 1 0 
Org_Ptrichocarpa Populus trichocarpa 1 1 1 1 0 
Org_Pvulgaris Phaseolus vulgaris 0 1 0 1 0 
Org_Rcommunis Ricinus communis 1 1 0 1 0) 
Org_Slycopersicum Solanum lycopersicum 0 1 1 1 0 
Org_Spurpurea Salix purpurea 0 1 1 0 0 
Org_Stuberosum Solanum tuberosum 1 NA 1 1 0 
Org_Tcacao Theobroma cacao 0 1 1 1 0 
Org_Vvinifera Vitis vinifera 0 1 1 1 0 


LETTER 


Targets were predicted using targetfinder.pl, keeping all hits with a score of 4.5 or less. Probable orthologues of the indicated A. thaliana genes were found using BLASTP against the 31 eudicot species 
present in Phytozome 11, simply keeping up to the top 100 BLAST hits. miRNA queries were all mature miRNA and miRNA* from C. campestris interface-induced miRNA loci. NA, indicates that no 
probable orthologues were recovered from a given species; 1, indicates that there was one or more predicted target in that species; 0, indicates there were O predicted targets. GAPDH orthogroup, 
negative control. 
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Acoustic reporter genes for noninvasive imaging of 
microorganisms in mammalian hosts 


Raymond W. Bourdeau!, Audrey Lee- Gosselin!, Anupama Lakshmanan’, Arash Farhadi?, Sripriya Ravindra Kumar’, 


Suchita P. Nety! & Mikhail G. Shapiro! 


The mammalian microbiome has many important roles in health 
and disease’”, and genetic engineering is enabling the development 
of microbial therapeutics and diagnostics*~’. A key determinant 
of the activity of both natural and engineered microorganisms in 
vivo is their location within the host organism®*. However, existing 
methods for imaging cellular location and function, primarily based 
on optical reporter genes, have limited deep tissue performance 
owing to light scattering or require radioactive tracers'”"'”. Here 
we introduce acoustic reporter genes, which are genetic constructs 
that allow bacterial gene expression to be visualized in vivo using 
ultrasound, a widely available inexpensive technique with deep 
tissue penetration and high spatial resolution!?-!°. These constructs 
are based on gas vesicles, a unique class of gas-filled protein 
nanostructures that are expressed primarily in water-dwelling 
photosynthetic organisms as a means to regulate buoyancy)”, 
Heterologous expression of engineered gene clusters encoding 
gas vesicles allows Escherichia coli and Salmonella typhimurium 
to be imaged noninvasively at volumetric densities below 0.01% 
with a resolution of less than 100j1m. We demonstrate the 
imaging of engineered cells in vivo in proof-of-concept models of 
gastrointestinal and tumour localization, and develop acoustically 
distinct reporters that enable multiplexed imaging of cellular 
populations. This technology equips microbial cells with a means 
to be visualized deep inside mammalian hosts, facilitating the study 
of the mammalian microbiome and the development of diagnostic 
and therapeutic cellular agents. 

Gas vesicles comprise all-protein shells with sizes of approximately 
200 nm that enclose hollow interiors, and allow dissolved gases to per- 
meate freely in and out while excluding water'®. We recently discovered 
the ability of these proteins to scatter sound waves and thereby produce 
ultrasound contrast!*. However, the ability of the multi-gene clusters 
encoding gas vesicles to serve as reporter genes in heterologous 
species has not been demonstrated. Gas vesicles are encoded in their 
native bacterial or archaeal hosts by operons of 8-14 genes, which 
include the primary structural protein GvpA, the optional external 
scaffolding protein GvpC, and several secondary proteins that func- 
tion as essential minor constituents or chaperones!”. Asa starting point 
for developing acoustic reporter genes (ARGs), we chose a compact 
E. coli-compatible gas vesicle gene cluster from Bacillus megaterium'® 
(Fig. 1a; top left). Although cells containing this construct were able to 
produce small, bicone-shaped gas vesicles (Fig. 1b, c; left), its expression 
did not result in bacteria that were detectable by ultrasound (Fig. 1d; 
left), most probably because the small gas vesicles produced from this 
construct have weak acoustic scattering. At the same time, transforming 
E. coli with a gas vesicle gene cluster derived from the cyanobacterium 
Anabaena flos-aquae, the gas vesicles of which are highly echogenic!*””, 
did not result in gas vesicle expression. Given the high sequence 
homology of GvpA between organisms (Extended Data Fig. 1), we 
hypothesized that a combination of the structural gvpA genes from 


2 


A. flos-aquae with the accessory genes gvpR-gvpU from B. megaterium 
(Fig. 1a; middle) would result in the formation of gas vesicles with 
characteristics favourable for ultrasound imaging. Indeed, expression 
of this engineered gene cluster resulted in E. coli containing gas vesicles 
with substantially larger dimensions compared to the B. megaterium 
operon, and these nanostructures appeared to occupy a greater frac- 
tion of intracellular volume (Fig. 1b, c; middle). Notably, these cells 
produced robust ultrasound contrast compared to green fluorescent 
protein (GFP) controls (Fig. 1d; middle). Further engineering com- 
prising the addition of a gene encoding the A. flos-aquae scaffolding 
protein GvpC (Fig. 1a; right) resulted in wider and more elongated gas 
vesicles that more closely resembled those native to A. flos-aquae'® 
(Fig. 1b, c; right), and generated stronger ultrasound contrast (Fig. 1d; 
right). We refer to this optimized genetically engineered construct as 
acoustic reporter gene 1 or argl. 

To confirm that the ultrasound signal from arg1-expressing cells 
is due to the presence of gas vesicles, we applied acoustic pulses with 
amplitudes above the critical collapse pressure of the gas vesicles”. In 
purified samples, this resulted in the immediate collapse of these pro- 
tein nanostructures and dissolution of their gas contents, eliminating 
ultrasound contrast'®°. As expected, the application of high-pressure 
pulses made cells expressing arg] invisible to ultrasound (Fig. 1d). The 
ability of ARG-based contrast to be erased in situ is used throughout this 
study to confirm the source of acoustic signals and subtract background. 

arg expression resulted in gas vesicle contents of 9.4+0.4mg g™! 
E. coli (n=3, mean +s.e.m.), corresponding to approximately 100 
gas vesicles per cell. These nanostructures occupy roughly 10% 
of the intracellular space. Acoustically silent cells expressing the 
B. megaterium gene cluster produced a similar quantity of gas vesicle 
proteins (9.7 + 1.5mg g~!, n= 3), underscoring the importance of 
genetic engineering in producing intracellular nanostructures with the 
appropriate size and shape to be detected by ultrasound. A fraction of 
arg1-expressing cells was buoyant in aqueous medium (Extended Data 
Fig. 2a, b), suggesting that gas vesicles occupy more than 10% of their 
volume. However, the expected buoyant force on these cells, even at 
much higher expression levels, is weak compared to other forces such 
as flagellar thrust (Supplementary Table 1). 

To determine the detection limit of ARG-expressing cells, we 
imaged a concentration series of E. coli transformed with arg! (Fig. 2a). 
Cells at concentrations as low as 5 x 107 cells ml~! produced a detect- 
able signal (Fig. 2a, b). This equates to a roughly 0.005% volume 
fraction, or approximately 100 cells per voxel based on cubic voxel 
dimensions of 100 1m. This sensitivity should be sufficient for many 
in vivo scenarios”'. Furthermore, bacteria enriched for buoyancy before 
imaging provide a 2.4-fold higher signal (Extended Data Fig. 2c, d), 
suggesting that sensitivity could be improved further by optimizing 
ARG expression. 

To test whether ARGs could provide a read-out of state-dependent 
genetic pathways, we placed arg1 under the control of a promoter 
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Figure 1 | Genetic engineering of acoustic reporter genes. 

a, Organization of acoustic reporter gene clusters; the region highlighted 
in grey was varied. Panels b—d are organized in columns that correspond to 
each of the variant constructs. b, TEM images of representative E. coli cells 
expressing each construct. c, TEM images of gas vesicles isolated from 

E. coli expressing each construct. d, Ultrasound images of agarose 
phantoms containing E. coli expressing each construct or GFP. The cell 
concentration is 10° cells ml~'. Images in the bottom panels were acquired 
after acoustic collapse. Dotted blue outlines indicate the location of each 
specimen. Colour bar represents linear signal intensity. Scale bars, 

500 nm (b), 250 nm (c) and 2mm (d). All imaging experiments were 
repeated three times with similar results. 


regulated by the chemical inducer isopropyl-B-p-thiogalactoside 
(IPTG). Ultrasound signals from E. coli expressing ARGs in this 
configuration followed the expected dose-response curve of IPTG- 
controlled expression (Fig. 2c, d), confirming their ability to serve as 
the output signal for engineered genetic circuits. Significant ultrasound 
contrast could be observed 4h after IPTG induction (P=0.01, n=4), 
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Figure 2 | Imaging dilute bacterial populations and dynamically 
regulated gene expression. a, Ultrasound images of arg1-expressing 

E. coli at various cellular concentrations, before and after acoustic collapse. 
b, Mean ultrasound contrast from E. coli expressing arg] and GFP at 
various cell densities. Data are from three biological replicates, lines 
indicate the mean. AU, arbitrary units. c, Ultrasound images of E. coli 
expressing arg] after induction with various concentrations of IPTG. Cell 
concentration was 5 x 10° cells ml~!. d, Normalized ultrasound contrast 
as a function of IPTG concentration. Data are from three biological 
replicates, line shows a fit of the data with the Hill equation to facilitate 
visualization. Each imaging experiment was repeated three times with 
similar results. Scale bars, 2mm. 
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Figure 3 | Multiplexed imaging of genetically engineered reporter 
variants. a, Diagram of the gvpA and gvpC sequences included in the arg1 
and arg2 gene clusters. b, Ultrasound images of a gel phantom containing 
E. coli expressing GFP or arg2 (10° cells ml~'). Dotted blue outlines 
indicate the location of each specimen. c, TEM images of isolated arg2 gas 
vesicles. d, Ultrasound images of gel phantoms containing arg1 or arg2 
before collapse, after collapse at 2.7 MPa and after collapse at 4.7 MPa 
(10° cells ml~'). e, Overlay of the blue and orange maps from spectral 
unmixing of arg2 and arg], based on the series of images in d. Scale bars, 
2mm (b, d, e) and 250 nm (c). Each imaging experiment was repeated 
three times with similar results. 


and continued to increase during the 22-h culturing period (Extended 
Data Fig. 3). 

To determine whether the expression of ARGs has any deleterious 
effect on host cells, we measured the growth curves of E. coli expressing 
arg1 or GFP. After induction, cells expressing both constructs continued 
to divide and reached similar saturation densities (Extended Data 
Fig. 4a). For both arg] and GFP, the final density was lower than in 
uninduced controls, as expected from the metabolic demand of protein 
expression””. We also assessed the viability of ARG-expressing cells 
after ultrasound imaging and acoustic collapse. Transmission electron 
microscopy (TEM) images of cells acquired before and after exposure 
to collapsing acoustic pulses show that gas vesicles can be eliminated 
without any obvious cellular damage (Extended Data Fig. 4b). To 
examine the effect of ultrasound exposure on cell growth, we cultured 
E. coli expressing arg1 as colonies on solid medium and applied acous- 
tic collapse pulses to half of the agar plate. The collapse of gas vesicles 
in insonated cells was confirmed by a decrease in optical scattering 
(Extended Data Fig. 4c, d). After incubation for an additional 20h, no 
significant difference was observed in the diameter of the insonated 
colonies compared to un-insonated controls, indicating that ultrasound 
exposure does not affect cell viability (Extended Data Fig. 4e). Notably, 
insonated colonies re-expressed gas vesicles during this period, as indi- 
cated by the restoration of pressure-sensitive light scattering (Extended 
Data Fig. 4e, f). 

It is often informative to image more than one population of cells 
simultaneously, as done optically using spectrally distinct fluorescent 
proteins. Analogous acoustic multiplexing can be performed using 
genetic variants of gas vesicles that collapse at different pressures 
using multiple images acquired during sequential application of 
increasing pressure pulses”? (Supplementary Note 1). To explore 
whether this could be done with ARGs, we constructed a new version 
of the ARG-expressing gene cluster containing a modified version of 
A. flos-aquae gvpC. Deletion or truncation of this outer scaffolding 
protein results in gas vesicles with lower collapse pressures”, allowing 
the production of nanostructures that are distinguishable from each 
other under ultrasound””. Using this approach, we modified our gene 
cluster by truncating GvpC to retain only one of its five repeating 
a-helical domains (Fig. 3a). E. coli expressing the resulting gene cluster, 
which we refer to as arg2, showed robust production of gas vesicles 
and ultrasound contrast, similar to arg1 (Fig. 3b, c and Extended Data 
Fig. 5a-c). Consistent with our design, gas vesicles purified from 
arg2-expressing E. coli had a lower critical hydrostatic collapse pressure 


4 JANUARY 2018 | VOL 553 | NATURE | 87 


© 2018 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. 


LETTER 


Ultrasound 


Gl tract U V u f 


\Y 
Cag 


E. coli Nissle 1917 


Intact 


Collapsed 


Figure 4 | Ultrasound imaging of bacteria in the gastrointestinal tract. 
a, Diagram of gastrointestinal (GI) imaging experiment. b, Representative 
TEM images of whole ECN cells expressing arg] or the lux operon. Images 
were acquired from three biologically independent samples for arg and 
one for /ux (approximately 35 cells imaged in each sample) with similar 
results. c, Ultrasound images of a gel phantom containing ECN expressing 
arg] or the lux operon. Experiment repeated five times with similar results. 
d, Mean collapse-sensitive ultrasound signal in phantoms containing 

ECN cells expressing arg] or Jux. Line represents mean. (P = 0.0007 using 
a two-sided heteroscedastic t-test, n =5). Cell concentration in c-d was 
10° cells ml’. e, Transverse ultrasound image of a mouse whose colon 


than nanostructures formed by cells expressing arg1 (Extended Data 
Fig. 5d), and cellular arg2 contrast was erasable at lower acoustic pres- 
sures (Extended Data Fig. 5e). The distinct collapse spectra of the two 
variants (Extended Data Fig. 5f) allowed E. coli expressing arg1 and 
arg2 to be imaged in multiplex using pressure spectrum unmixing 
(Fig. 3d, e). 

After establishing the core capabilities of ARGs in vitro, we set out 
to demonstrate their detectability in vivo by imaging ARG-expressing 
cells in biologically relevant anatomical contexts. One important target 
for in vivo microbial imaging is the mammalian gastrointestinal tract, 
given the effect of the gut microbiome on the host’s health!*” and the 
development of gastrointestinal-targeted microbial therapeutics*”*. 
Owing to its location deep inside the body, the gastrointestinal tract 
is difficult to image using optical techniques. To establish a proof of 
concept for ultrasonic imaging of microorganisms in this context, we 
expressed ARGs in a probiotic bacterial strain and assessed the ability 
of ultrasound to localize this bacterium inside the colon (Fig. 4a) in 
comparison with bioluminescent imaging. The E. coli strain Nissle 
1917 (ECN) is a probiotic microorganism capable of colonizing the 
mammalian gastrointestinal tract*®. ECN has been used clinically 
in humans for 100 years to treat enteric infection and inflammatory 
bowel conditions”’, and is a common chassis for therapeutic synthetic 
biology***”°. ECN cells transformed with a plasmid expressing arg! 
produced abundant gas vesicles (Fig. 4b) and ultrasound contrast 
(Fig. 4c, d). For comparison, we transformed ECN cells with the lumi- 
nescence operon /uxABCDE (lux), which has previously been used 
to visualize gene expression in microbial populations in vivo using 
bioluminescent imaging**?’. lux-expressing ECN cells produced no 
ultrasound contrast (Fig. 4c, d). 

To establish a proof of concept for ultrasound imaging of ARG- 
expressing bacteria within the gastrointestinal tract, and to compare 
the result with bioluminescent imaging, we introduced ECN 
cells expressing arg1 or lux into the colons of anaesthetized mice. 
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contains ECN expressing argl proximal to the colon wall, and ECN 
expressing /ux at the centre of the lumen. f, Luminescence image of mouse 
with the same arrangement of colonic bacteria. g, h, As in e and f, but with 
ECN expressing arg! at the centre of the lumen and ECN expressing lux 

at the periphery. Cells are loaded at a final concentration of 10° cells ml“. 
In eand g, a difference heat map of ultrasound contrast within the colon 
region of interest before and after acoustic collapse is overlaid on a 
greyscale anatomical image. In f and h, a thresholded luminescence map is 
overlaid on a bright-field image of the mouse. Scale bars, 500 nm (b), 2mm 
(c) and 2.5 mm (e, g). In vivo imaging experiments were repeated three 
times with similar results. 


To assess the ability of each modality to resolve the spatial distribution 
of bacteria within the colon, we injected the arg1 and lux cells into 
the centre or periphery of the colonic lumen (Fig. 4e-h). Ultrasound 
images clearly revealed the localization of ARG-expressing ECN cells 
in the appropriate region of the colon (Fig. 4e, g) at concentrations 
of 10° cells ml~!, which is within the range of certain commensal 
and therapeutic scenarios, and below the density reached by ECN in 
gnotobiotic models”!°. By contrast, bioluminescent images showed 
only that the bacteria are present somewhere in the mouse abdomen 
(Fig. 4f, h). To facilitate visualization of ARG-specific signals, our ultra- 
sound image analysis used background subtraction after gas vesicle 
collapse, with the resulting contrast overlaid on greyscale anatomical 
images to show the location of the bacteria within the context of other 
internal organs. Alternatively, ARG-expressing cells can also be seen in 
the colon in raw ultrasound images (Extended Data Fig. 6). Contrast 
from colon-localized E. coli was consistent across mice (Extended Data 
Fig. 7). These results establish the ability of ARGs to make genetically- 
labelled microorganisms visible noninvasively in deep tissue, and 
demonstrate the advantage of ultrasound relative to optical imaging 
in terms of spatial localization within deep organs. 

Some degree of burden is expected to accompany heterologous 
protein expression”®”’. To assess the burden on ECN cells presented by 
arg1, we characterized their growth, viability, maintenance of reporter 
expression and release of microcins. We observed that arg] expression 
is generally well tolerated, with some scope for optimization (Extended 
Data Fig. 8 and Supplementary Note 2). 

In addition to the gastrointestinal tract, another emerging appli- 
cation of engineered microorganisms is in antitumour therapies and 
diagnostics**°. To test whether such microorganisms could be imaged 
with ultrasound, and assess whether ARGs could be generalized to 
additional species besides E. coli, we adapted the genetic construct 
encoding arg1 for expression in the attenuated, tumour-homing 
S. typhimurium strain ELH1301 (refs 16, 30), and showed that we could 
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Figure 5 | High-throughput screening of acoustic phenotypes. 

a, Illustration of acoustic colony screening. b, Colony ultrasound images 
of a mixed population of E. coli colonies expressing arg1, arg2, and GFP. 
Images were acquired before collapse and after collapse at peak acoustic 
pressures of 4 and 6 MPa. This imaging experiment was performed once; 


image these cells after injection into tumours (Extended Data Fig. 9 and 
Supplementary Note 3). 

Finally, to facilitate future genetic engineering of ARGs, we assessed 
the amenability of these constructs to high-throughput screening. 
In fluorescent protein engineering, directed evolution has served 
as an effective approach ES identify variants with new spectral and 
biochemical properties*’*”, often using mutant Dacia colonies as 
a convenient platform for high-throughput screening*”. To determine 
whether a similar approach could be used with ARGs, we developed 
a method to scan bacterial colonies with ultrasound (Fig. 5a). In this 
method, colonies are immobilized on agar plates with an over-layer 
of agarose, then scanned with an ultrasound transducer translated 
by a computer-controlled robot. This results in a series of transverse 
images that can be reconstructed to form an in-plane image of the 
plate (Fig. 5b). We used this technique to image a mixed plate of 
E. coli transformed with arg1, arg2 or GFP. Serial acoustic collapse 
imaging (Fig. 5b) revealed three distinct colony populations (Fig. 5c 
and Extended Data Fig. 10), allowing the genotypes to be distinguished 
from each other with 100% accuracy (Fig. 5d). This result suggests 
that colony screening can discriminate acoustic phenotypes with suffi- 
cient accuracy to serve as a high-throughput assay for acoustic protein 
engineering. 

Our study establishes engineered gas vesicle gene clusters as reporter 
genes for ultrasound, giving this widely used noninvasive imaging 
modality the ability to visualize genetically modified bacteria inside 
living animals. Future work will build on the in vitro and in vivo proofs 
of concept presented in this study to answer scientific and translational 
questions. This research will benefit from the development of ultra- 
sound techniques to detect ARG signals and distinguish them from 
background (Supplementary Note 4), further genetic engineering 
to optimize the stability and host burden of ARG constructs, and 
expression of these reporters in a broader range of microbial species 
(Supplementary Note 5). In addition, it is ultimately desirable to express 
ARGs in mammalian cells. 

We anticipate that the ARGs presented in this work are a starting 
point for future engineering of ultrasound reporter genes. Since their 
initial discovery as optical reporters, fluorescent proteins have been 
engineered, evolved and used in thousands of unforeseen optical 
imaging applications. Our findings that genetic engineering can be used 
to generate ARGs with distinct acoustic properties and that ARGs are 
amenable to colony-based high-throughput screening suggest that a 
similar trajectory may be available for this new technology. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 


Chemicals. All chemicals were purchased from Sigma Aldrich unless otherwise 
noted. 

Molecular cloning. To construct the plasmid for E. coli BL21(A1) expres- 
sion of ARGs, the gene cluster encoding B. megaterium gas vesicle proteins 
GvpBRNFGLSKJTU was amplified from pNL29 (ref. 19) (gift from M. Cannon) 
and cloned into pET28a using Gibson assembly. The amplicon included an addi- 
tional 46 base pairs (bp) upstream of the gvpB start codon and 180 bp downstream 
of the gvpU stop codon. To generate hybrid gene clusters, the genes encoding GvpA 
and GvpC were amplified from A. flos-aquae and cloned into pET28-RNFGLSK- 
JTU using Gibson assembly. A control gene encoding the green fluorescent protein 
(GFP) mNeonGreen*® was similarly constructed in the pET28 vector. For expres- 
sion of ARGs in E. coli Nissle 1917, the pET28 T7 promoter was replaced by the T5 
promoter. For S. typhimurium expression, the ARG gene cluster was cloned into 
pTD103 (gift from J. Hasty). A plasmid encoding the luxCDABE gene cluster from 
Photorhabdus luminescens on the pTD103 backbone was also a gift from J. Hasty. 
Bacterial expression. Plasmids encoding ARGs or GFP were transformed into 
chemically competent E. coli BL21(A1) cells (Thermo Fisher Scientific) and grown 
in 5 ml starter cultures in LB medium with 50,g ml“! kanamycin, 1% glucose for 
16 hat 37°C. Large-scale cultures in LB medium containing 50 j1g ml? kanamycin 
and 0.2% glucose were inoculated at a ratio of 1:100 with the starter culture. Cells 
were grown at 37°C to OD¢00nm = 0.5, then induced with 0.5% L-arabinose and 
0.4mM IPTG for 22h at 30°C. For E. coli Nissle 1917 (Ardeypharm GmbH) the 
same protocol was followed, except constructs were electroporated into the cells 
and induction was performed at OD¢00nm = 0.3 with 31M IPTG (arg1) and 3 nM 
N-(8-ketocaproyl)-L-homoserine lactone (AHL) (lux). Strain identity of E. coli 
Nissle 1917 cells was confirmed by PCR™. For S. typhimurium expression, the same 
protocol was followed, except constructs were electroporated into S. typhimurium 
ELH1301 (gift from J. Hasty) and expression was induced with 3nM AHL. 

Gas vesicle purification and quantification. Collected cells were centrifuged at 
350g in 50 ml conical tubes for 4h with a liquid height <10 cm to prevent collapse 
of gas vesicles by hydrostatic pressure. For ARG variants that produce a buoyant 
band of cells, the middle layer between the buoyant cells and the sedimented cells 
was removed and discarded. For ARG variants that do not produce a buoyant 
band, the supernatant was discarded. The remaining cells were resuspended in 8 ml 
SoluLyse-Tris (L200500 Genlantis) per 100 ml culture and 250,11 ml! lysozyme, 
and incubated for 1h at 4°C with rotation. Subsequently, 1011 ml~! DNasel was 
added to the lysate and incubated for 10 min at 25°C. The lysate was transferred to 
2 ml tubes and centrifuged for 2h at 400g at 8°C. The subnatant was removed with 
a 21.5-gauge needle, and the supernatant containing the gas vesicles was transferred 
to a clean tube. PBS was added to the gas vesicles in a threefold volume excess 
and centrifugation, removal of subnatant and PBS dilution was repeated three 
times. Purified gas vesicles were quantified using the Micro BCA Protein Assay Kit 
(Thermo Fisher Scientific). Gas vesicles were collapsed with hydrostatic pressure 
before quantification. Bovine serum albumin was used to generate the standard 
curve. Absorbance measurements were taken on a Spectramax M5 spectropho- 
tometer (Molecular Devices). 

TEM sample preparation and imaging. Cells expressing ARGs, or purified gas 
vesicles, were exchanged into water or 10mM HEPES pH 8.0 with 150 mM NaCl, 
respectively, via three rounds of buoyancy purification and buffer exchange as 
described above. Samples were deposited on Formvar/carbon 200 mesh grids (Ted 
Pella) that were rendered hydrophilic by glow discharging (Emitek K100X). For 
purified gas vesicles, 2% uranyl acetate was added for staining. The samples were 
then imaged on a FEI Tecnai T12 transmission electron microscope equipped with 
a Gatan Ultrascan CCD. Images were processed with Fiji®®. 

Hydrostatic collapse pressure measurements. Cells expressing ARGs, or purified 
gas vesicles, were diluted to OD¢00nm = 1.0 in PBS and 0.4 ml was loaded into an 
absorption cell (176.700-QS, Hellma GmbH). A single valve pressure controller 
(PC series, Alicat Scientific), supplied by a 1.5-MPa nitrogen gas source, applied 
hydrostatic pressure in the cell, while a microspectrometer (STS-VIS, Ocean 
Optics) measured the optical density of the sample at 500nm. ODsoonm was 
measured from 0 to 1.2 MPa gauge pressure with a 10-kPa step size and a 7-s 
equilibration period at each pressure. 

In vitro ultrasound imaging. Phantoms for imaging were prepared by melting 
1% (w/v) agarose in PBS and casting wells using a custom 3D-printed template. 
Cells at 2 the final concentration were mixed in a 1:1 ratio with molten agarose 
(at 50°C) and immediately loaded into the phantom. The concentration of cells 
was determined before loading by measuring their OD¢o0nm after exposure to 
1.2 MPa hydrostatic pressure to eliminate any contribution to light scattering 
from gas vesicles. The optical density was then converted into cells per ml using 
the relationship 1 OD =8 x 108 cells ml”! (https://www.genomics.agilent.com/ 
biocalculators/calcODBacterial.jsp). Cell samples collected at early time points 
following induction, which had an optical density insufficient for loading, were 
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first concentrated using centrifugation at 350g. Ultrasound imaging was per- 
formed using a Verasonics Vantage programmable ultrasound scanning system 
and L22-14v 128-element linear array transducer (Verasonics). The transducer 
was mounted on a computer-controlled 3D translatable stage (Velmex). Image 
acquisition was performed using conventional B-mode imaging using a 128-ray- 
lines protocol with a synthetic aperture to form a focused excitation beam. The 
transmit waveform was set to a frequency of 19 MHz, 67% intra-pulse duty cycle, 
and a one-cycle pulse. Samples were positioned 6 mm from the transducer face, 
which is the elevation focus of the L22-14v transducer, coupled through a layer 
of PBS. The transmit beam was also digitally focused at 6mm. For imaging, the 
transmit voltage was 2 V and the f-number was 3, resulting in a peak positive 
pressure of 0.4 MPa. Backscattered ultrasound signals were filtered with a 7-MHz 
bandpass filter centred at 19 Mhz. Signals backscattered from four transmit events 
were summed before image processing. Pixel gain was set to 3 and persistence to 90. 

For gas vesicle collapse using the L22-14 array, we set the fnumber to 0.2 
(thereby ensuring that all transducer elements were active) and scanned the 
transmit focus from 3mm to 9mm. During the 10-s collapse scan, single-cycle 
pulses were applied using a ray-lines protocol at 19 MHz with a frame rate of 
12 frames per second. To measure gas vesicle collapse in ARG-expressing cells as 
a function of acoustic pressure, images were acquired as described above at a peak 
positive pressure of 0.4 MPa after sequentially exposing the samples to collapse 
pulses of increasing amplitude, with pressures that varied from 0.55 MPa to 
4.7 MPa. To achieve complete collapse, we applied the maximal pressure of 4.7 
MPa. Collapse data were fitted with a Boltzmann sigmoid function to facilitate 


P-Pe)} 
visualization of collapse curves. This function is of the form f(p) = l +es 


where p is the pressure, and p, and s are fitted parameters representing the collapse 
midpoint and slope, respectively. For spectral unmixing, the two collapse pressures 
applied were 2.7 MPa and 4.7 MPa. Transducer output pressures were measured in 
a degassed water tank using a fibre-optic hydrophone (Precision Acoustics). 
Plate-based induction and optical imaging. ARG and GFP constructs were 
transformed as described above, and the transformation mix after recovery 
was plated on two-layer LB-Agar plates. The underlayer contained 50,1g ml! 
kanamycin, 1.0% L-arabinose, and 0.8 mM IPTG. The overlayer contained 
501g ml! kanamycin and 0.4% glucose. The overlayer was poured 30 min before 
plating, and each layer was 4mm thick. Plates with transformants were incubated 
at 30°C for 20h and then imaged for white light scattering and green fluorescence 
using a Chemidoc MP instrument (Bio-Rad). 

Cell growth, viability and microcin production assays. E. coli Nissle 1917 cells 
were transformed by electroporation with pET28 plasmids containing either the 
arg or lux gene cluster under the T5 promoter. Transformed cells were grown in 
5 ml starter cultures in LB medium containing 50j1g ml“! kanamycin, 1% glucose 
for 16h at 37°C. The overnight cultures were diluted 1:100 in 50 ml of LB medium 
containing 501g ml! kanamycin and 0.2% glucose. Cultures were grown at 30°C 
to ODe¢o0nm & 0.2-0.3 and induced with 311M IPTG (+IPTG), or left uninduced 
(—IPTG). Both induced and uninduced cultures were allowed to grow for 22h at 
30°C. For time point optical density measurements, 1 ml of the culture was taken 
out and measured. For plating after 22 h of growth, the cultures were diluted to a 
uniform ODgoonm of 0.2, before further serial dilution by a factor of 2 x 104 in LB 
supplemented with 50j1g ml~! kanamycin and 0.2% glucose. 10011 of the final 
dilutions was plated on two-layer LB agar plates using a cell spreader. The under- 
layer of the plates contained 50,1g ml! kanamycin and 91M IPTG. The overlayer 
contained 501g ml! kanamycin and 0.4% glucose. The overlayer was poured 
30 min before plating, and each layer was 3mm thick. Cells uniformly spread on 
the two-layer plates were allowed to grow at 30°C for 21h. Colonies were then 
imaged for light scattering using the Chemidoc MP instrument under white light 
transillumination and 605 + 50 nm receive filter, and both opaque (gas vesicle- 
producing) and clear colonies were counted to determine total colony forming 
units per millilitre and the gas vesicle-expressing fraction. Plates had a minimum 
of 82 and a maximum of 475 total colonies, enabling manual counting. 

To assay microcin production, E. coli Nissle 1917 cells containing arg1 or lux 
were cultured as described above and spotted on microcin assay plates containing 
E. coli K-12 H5316 cells (gift from K. Hantke). Wild-type H5316 were grown in 
5 ml LB medium, and H5316 cells transformed with pET plasmid containing 
mWasabi and KanR under a T5 promoter (H5316* cells) were grown in 5 ml LB 
medium containing 50j1g ml“! kanamycin and 1% glucose for 16h at 37°C. Two- 
layer LB plates were used to assay the growth inhibition of H5316 cells by microcin 
peptides produced by Nissle 1917 cells. Plates used to assay with wild-type H5316 
cells contained 20 ml of 1% LB agar at the bottom, and the top layer contained 
2 x 10’ H5316 cells in 20 ml of 0.3% LB agar. Plates using H5316* cells contained 
20 ml of 1% LB agar with 50,1g ml! kanamycin, 501M desferal, and 31M IPTG, 
and the top layer contained 2 x 10” H5316* cells in 20 ml of 0.3% LB agar with 
50g ml! kanamycin, 50}.M desferal, and 31M IPTG. Nissle cells containing 
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argl or lux genes were cultured at 30°C for 22h with or without 31M IPTG. 
Nissle cells with arg1 were exposed to 1 MPa of hydrostatic pressure to facilitate 
the removal of kanamycin by centrifugation before spotting on H5316 plates. 
Nissle cells containing arg] and lux induced and uninduced with IPTG, as well as 
H5316* cells, were washed 3 x in PBS by pelleting and adjusted to OD¢00nm = 1 
in LB. All cells were spotted in 2-111 volume on 5-mm sterile filter paper (Bel-Art 
Products), placed on the microcin assay plates. Unsupplemented LB and 
100mg mI ampicillin (211 each) were similarly spotted as controls. After 17h 
at 37°C, the plates were imaged with the Chemidoc MP instrument with blue 
transillumination, and unfiltered light was collected to form an image. Images 
shown are representative of four experiments each. 

Colony ultrasound. ARG and GFP constructs were transformed into BL21(A1) 
one-shot competent cells (Thermo Fisher Scientific) and plated onto LB agar two- 
layer inducer plates as described above. Plates were grown at 37°C for 14h. The 
colonies were immobilized by depositing a 4mm layer of 0.5% agarose—PBS gently 
onto the plate surface. Ultrasound imaging was performed using a L11-4v128- 
element linear array transducer (Verasonics) to obtain a larger field of view. The 
transducer was mounted on a computer-controlled 3D translatable stage (Velmex). 
Image acquisition was performed using conventional B-mode imaging using a 
128-ray-lines protocol with a synthetic aperture to form a focused excitation beam. 
The transmit waveform was set to a frequency of 6.25 MHz, 67% intra-pulse duty 
cycle, and a four-cycle pulse. Colonies were positioned 20 mm from the transducer 
face, which is the elevation focus of the L11-4v transducer, coupled through a 
layer of PBS. The transmit beam was also digitally focused at 20mm. For imaging, 
the transmit power was 2 V and the f-number was 3, resulting in a peak positive 
pressure of 0.61 MPa. To measure gas vesicle collapse in bacterial colonies as a 
function of acoustic pressure, images were acquired as described above at a peak 
positive pressure of 0.61 MPa after sequentially exposing the samples to collapse 
pulses at 6.25 MHz, with increasing amplitude from 0.61 MPa to 5.95 MPa. Pixel 
gain in the images was set to 0.1 and persistence to 20. Cross-sectional images of the 
plate (perpendicular to the plate surface) were acquired at spatial intervals of 250 jim 
using computer-controlled steps. The cross-sectional images were processed in 
MATLAB to form 2D images of the plate surface. First, the cross-sectional images 
were stacked to produce a 3D-volumetric reconstruction of the plate. We then 
summed the signals in a 2-mm slice of the volume parallel to and centred on the 
bacterial growth surface after thresholding to eliminate background, forming a 
2D projection image of the plate. After ultrasound imaging, image processing, 
and acoustic phenotype prediction, the colonies were picked using 10-1] sterile 
pipette tips. Each colony was used to inoculate a 5-ml LB culture containing 
50g ml! kanamycin culture. DNA was extracted from the cultures by mini-prep 
(PureYield, Promega) and sequenced to determine whether the plasmid contained 
GFP, argl or arg2. 

In vivo ultrasound and bioluminescence imaging. All in vivo experiments were 
performed on BALB/c or SCID nude female mice, aged 14-15 weeks, under a 
protocol approved by the Institutional Animal Care and Use Committee of the 
California Institute of Technology. No randomization or blinding were necessary 
in this study. Ultrasound imaging was performed as follows. Mice were anaes- 
thetized with 1-2% isoflurane, maintained at 37°C on a heating pad, depilated 
over the imaged region, and imaged using an L22-14v transducer with the pulse 
sequence described above. For imaging of E. coli in the gastrointestinal tract, 
BALB/c mice were placed in a supine position, with the ultrasound transducer 
positioned on the lower abdomen, transverse to the colon. Anatomical landmarks 
including the bladder were used to identify the position of the colon. Prior to 
imaging, buoyancy-enriched E. coli Nissle 1917 expressing arg! or lux were mixed 
in a 1:1 ratio with 42°C 4% agarose-PBS for a final bacterial concentration of 
10? cells ml~'. An 8-gauge needle was filled with the mixture of agarose and 
bacteria expressing either argl or lux. Before it solidified, a 14-gauge needle was 
placed inside the 8-gauge needle to form a hollow lumen within the gel. After the 
agarose-bacteria mixture solidified at room temperature for 10 min, the 14-gauge 
needle was removed. The hollow lumen was then filled with the agarose-bacteria 
mixture expressing the other imaging reporter (arg1 or lux). After it solidified, 
the complete cylindrical agarose gel was injected into the colon of the mouse 


with a PBS back-filled syringe. The same procedure was used with E. coli BL21 
cells, except with the entire gel homogeneously composed of either arg2- or GFP- 
expressing cells. Introduction of gel into the colon is a common preparatory 
protocol for gastrointestinal ultrasound**?”. 

For imaging of S. typhimurium in tumours, we formed hind-flank ovarian 
tumour xenografts in SCID nude mice via subcutaneous injection of 5 x 107 
OVCARS cells (provided by the National Cancer Institute tumour repository 
with certificate of authentication) with Matrigel. After tumours grew to dimen- 
sions larger than approximately 6 mm (14 weeks), they were injected with argl- 
expressing S. typhimurium, (501, 3.2 x 10° cells ml~!). The tumours were then 
imaged with ultrasound, with anaesthetized mice in a prone position (homeostasis 
and imaging parameters as described above). Our animal protocol specified that 
animals with total tumour volume exceeding 2cm’, or showing signs of distress 
as assessed by the veterinary team, be euthanized. 

For luminescence imaging, mice were anaesthetized with 100 mg kg! ketamine 

and 10 mg kg”! xylazine and imaged using a Bio-Rad ChemiDoc MP imager 
without illumination, no emission filter, and an integration time of 5 min. The 
image was thresholded and rendered in Image], and superimposed on a bright-field 
image of the mouse using GIMP. 
Image processing. MATLAB was used to process ultrasound images. Regions of 
interest (ROIs) were defined to capture the ultrasound signal from the phantom 
well, colon, or tumour region. All in vitro phantom experiments had the same ROI 
dimensions. For in vivo experiments ROIs were selected consistently to exclude 
edge effects from the colon wall or skin. Mean pixel intensity was calculated from 
each ROL, and pressure-sensitive ultrasound intensity was calculated by subtracting 
the mean pixel intensity of the collapsed image from the mean pixel intensity of 
the intact image. Images were pseudo-coloured, with maximum and minimum 
levels adjusted for maximal contrast as indicated in accompanying colour bars. 

For the multiplexed imaging of arg] and arg2, acoustic spectral unmixing was 
performed as previously described”. In brief, a spatial averaging filter (kernel 
size 30 x 30 pixels or 750 x 750|1m) was applied to the three acquired images 
(before collapse, after collapse with 2.7 MPa and after collapse with 4.7 MPa) to 
reduce noise. Then, pixel-wise differences between the first and second image, and 
between the second and third image were calculated, and multiplied by the inverse 
of the collapse matrix, a, representing the expected fractional collapse of each 
ARG type at each pressure (a = (0.7921, 0.5718; 0.2079, 0.4282)), to produce the 
unmixed pixel intensities corresponding to the contributions from arg2 and arg]. 
Statistical analysis. For statistical significance testing, we used two-sided hetero- 
scedastic t-tests with a significance level of type I error set at 0.05 for rejecting the 
null hypothesis. Sample sizes for all experiments, including animal experiments, 
were chosen on the basis of preliminary experiments to be adequate for statistical 
analysis. 

Code availability. MATLAB code is available from the corresponding author upon 
reasonable request. 

Data and code availability. arg1 and arg2 plasmid sequences are included in 
Supplementary Information, and plasmids will be available from Addgene. 
All other materials are available from the corresponding author upon reasonable 
request. 
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Extended Data Figure 1 | Sequence homology of GvpA/B. Amino acid sequence alignment of the primary gas vesicle structural protein GvpB from 
B. megaterium (the GvpA analogue in this species) and GvpA from A. flos-aquae. 
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Extended Data Figure 2 | Ultrasound contrast from buoyancy-enriched 
cells. a, Diagram of centrifugation-assisted enrichment of buoyant cells. 
b, Image of arg! E. coli culture 22h after induction and 4h of 
centrifugation at 350g, showing the presence of buoyant cells. Arrowhead 
points to the meniscus layer containing buoyant cells. Experiment 
repeated three times with similar results. c, Ultrasound images of E. coli 
expressing arg] at various cellular concentrations, with and without 
buoyancy enrichment. Experiment was repeated three times with similar 
results. d, Ultrasound contrast from E. coli expressing arg1, with and 
without buoyancy enrichment, and GFP at various cell densities. Data are 
from three biological replicates; lines represent the mean. 
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Extended Data Figure 3 | Time course of acoustic reporter gene contrast 
after induction. a, Ultrasound images of argl-expressing E. coli at various 
times after induction with IPTG. Experiment repeated four times with 
similar results. b, Ultrasound contrast at each time point. Data are from 
four biological replicates; line represents the mean. Cell concentration, 

5 x 10° cells ml~!. Scale bar, 2mm. 
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Extended Data Figure 4 | Acoustic reporter gene expression and sample) with similar results. c, Dark-field optical image of agar plate 
ultrasound imaging does not affect cell viability. a, Growth curves of containing colonies of E. coli expressing arg] 14h after seeding. d, Image 
E. coli containing the arg] or GFP expression plasmid, with or without of the same plate after the right half of the plate was insonated with high- 
induction using 0.4mM IPTG. Data are from three biological replicates pressure ultrasound. e, Image of the same plate 20h after insonation. 
per sample; lines represent the mean. b, Representative TEM images of f, Image after the right half of the plate in e was insonated with high- 
whole E. coli cells expressing arg1 with and without exposure to acoustic pressure ultrasound. Zoomed in images of representative colonies 
collapse pulses, and E. coli cells expressing GFP. Images were acquired shown below each plate image. Scale bars, 500 nm. Experiment was 
from three biologically independent samples for arg1, two for arg] with repeated three times with similar results. 


ultrasound collapse and one for GFP (more than 50 cells imaged per 
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Extended Data Figure 5 | Multiplexed imaging of genetically intact fraction) of gas vesicles isolated from E. coli expressing argl or arg2 
engineered reporter variants. a, Image of arg2 E. coli culture 22h after as a function of applied hydrostatic pressure. e, Normalized ultrasound 
induction showing the presence of buoyant cells (top). Experiment intensity as a function of peak positive pressure from 0.6 to 4.7 MPa for 
repeated three times with similar results. Mass fraction of gas vesicles E. coli expressing arg] or arg2. f, Acoustic collapse spectra derived by 
produced 22h after induction (bottom). Line represents the mean. differentiating the data and curves in e with respect to applied pressure. 
b, Ultrasound contrast from the whole population of cells expressing a-f, Data are from three biological replicates per sample. d—-f, Curves 
arg, arg2 or GFP. Lines represent the mean. c, Ultrasound contrast from represent fits of the data using the Boltzmann sigmoid function to assist 


the buoyancy-enriched population of cells expressing arg1, arg2 or GFP. visualization. 
Lines represent the mean. d, Normalized optical density (representing the 
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Extended Data Figure 6 | Anatomical ultrasound images of acoustic 
bacteria in the gastrointestinal tract. Raw images underlying the 
difference maps shown in Fig. 4e, g. The cyan outline identifies the colon 


region of interest for difference processing. This experiment was repeated 
three times with similar results. 
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Extended Data Figure 7 | Ultrasound imaging of ARG-expressing cells on a grayscale anatomical image. b, Signal intensity in mice with E. coli 


in the mouse colon. a, Transverse ultrasound images of mice whose colon _ expressing either arg2 or GFP. Data are from 5 biological replicates per 
contains BL21 E. coli expressing either arg2 or GFP ata final concentration sample. P value = 0.02 using two-sided heteroscedastic t-test. Scale 

of 10° cells ml~!. A difference heat map of ultrasound contrast within bar, 2mm. 

the colon region of interest before and after acoustic collapse is overlaid 
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Extended Data Figure 8 | Effect of arg1 and lux expression on ECN 

cell growth, viability and microcin release. a, Optical density at 600 nm 
measured from 0 to 22h after induction with 3 1M IPTG, or without 
induction, in ECN cells transformed with arg] or lux. Data are from 

four biological replicates per time point, lines represent the mean. For 
comparisons between induced arg! and induced lux values at 22h 
P=0.12. For comparisons between uninduced arg] and uninduced 

lux at 22h P=0.04. For comparisons at all other time points P > 0.14. 

b, Colony-forming units (cfu) per millilitre culture per OD¢oonm after 

22h of induction with 31M IPTG, or uninduced growth, of ECN cells 
transformed with arg1 or lux. P > 0.22. Data are from 7 biological 
replicates for arg1 samples and four biological replicates for lux samples. 
Lines represent the mean. c, Fraction of opaque, gas vesicle-producing 
colonies produced by plating arg1-transformed ECN cells 22h after 
induction with 31M IPTG, or uninduced growth. Cells were plated on 
dual-layer IPTG induction plates, allowed to grow overnight at 30°C, 

and imaged as in (Extended Data Fig. 4c-f, P=0.12. data are from seven 
biological replicates, lines represent the mean. d, Microcin release assay 
using a uniform layer of the indicator strain E. coli K12 H5316 in soft 
agar, after 17-h incubation with filters containing microcin sources 

and controls, as indicated. ECN cells transformed with arg! or lux were 
induced for 22h with 31M IPTG, or grown without induction, before 
spotting. H5316* indicates H5316 cells transformed with mWasabi and 
cultured for 22h as with ECN cells. All cells were washed before spotting 
to remove antibiotic. Experiment was performed four times with similar 
results. Amp, 100 mg ml"! ampicillin; LB, LB medium. e, As in d, but with 
the indicator strain comprising H5316* cells and the agar containing 

50 pg ml"! kanamycin, 3 1M IPTG and 50M desferal, to show that 
microcin release also occurs during transgene expression. Note that the 
H5316* spot appears bright because the plate image is acquired with blue- 
light transillumination, resulting in mWasabi fluorescence. Experiment 
was performed four times with similar results. All P values were calculated 
using a two-sided heteroscedastic t-test. 
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Extended Data Figure 9 | Ultrasound imaging of S. typhimurium preparation each) with similar results. d, Ultrasound images of mouse 
in tumour xenografts. a, Diagram of tumour imaging experiment. OVCARS tumours injected with 50 il of 3.2 x 10? cells ml! arg1- 
S. typhimurium expressing arg] were introduced into the tumours of expressing S. typhimurium, before and after acoustic collapse. Experiment 
mice and imaged with ultrasound. b, Ultrasound images of a gel phantom repeated five times with similar results. e, Collapse-sensitive ultrasound 
containing S. typhimurium expressing arg! or the [ux operon. Cell contrast in tumours injected with arg1-expressing or Jux-expressing cells. 
concentration is 10° cells ml~'. Experiment repeated three times with Data are from five animals, line represents the mean. P= 0.002 using a 
similar results. c, TEM images of whole S. typhimurium cells expressing two-sided heteroscedastic t-test. Scale bars, 2mm (b), 500 nm (c) and 
arg with and without exposure to acoustic collapse pulses. At least 20 2.5mm (d). 


cellular images were acquired for each sample type (from one biological 
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Extended Data Figure 10 | High-throughput screening of acoustic 
phenotypes. a, Ultrasound intensity histogram of 22 randomly picked 
colonies. Colonies with low contrast were predicted to contain the gene 
encoding GFP and those with high contrast to contain genes encoding arg1 
or arg2 genes. b, Normalized change in ultrasound intensity (U) for each of 
the 15 argl or arg2 colonies after insonation at increasing pressures. 

At 4 MPa, colonies with signal above the indicated threshold were 
predicted to be arg! and below to be arg2. This experiment was performed 
once; each colony was treated as a biological replicate. 


© 2018 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. 


LETTER 


doi:10.1038/nature25015 


Cyclin D-CDK4 kinase destabilizes PD-L1 via 
Cul35?°? to control cancer immune surveillance 


Jinfang Zhang!*, Xia Bu?*, Haizhen Wang**, Yasheng Zhu‘, Yan Geng’, Naoe Taira Nihira!, Yuyong Tan!°, Yanpeng Ci>*®, 
Fei Wu’, Xiangpeng Dai, Jianping Guo!, Yu-Han Huang!, Caoqi Fan**, Shancheng Ren‘, Yinghao Sun‘, Gordon J. Freeman*#, 


Piotr Sicinski?# and Wenyi Wei!# 


Treatments that target immune checkpoints, such as the one 
mediated by programmed cell death protein 1 (PD-1) and its 
ligand PD-L1, have been approved for treating human cancers 
with durable clinical benefit!*. However, many cancer patients 
fail to respond to anti-PD-1/PD-L1 treatment, and the underlying 
mechanism(s) is not well understood>->. Recent studies revealed 
that response to PD-1/PD-L1 blockade might correlate with 
PD-L1 expression levels in tumor cells®’. Hence, it is important 
to mechanistically understand the pathways controlling PD-L1 
protein expression and stability, which can offer a molecular basis 
to improve the clinical response rate and efficacy of PD-1/PD-L1 
blockade in cancer patients. Here, we report that PD-L1 protein 
abundance is regulated by cyclin D-CDK4 and the Cullin 35?°? E3 
ligase via proteasome-mediated degradation. Inhibition of CDK4/6 
in vivo elevates PD-L1 protein levels, largely by inhibiting cyclin 
D-CDK4-mediated phosphorylation of SPOP and thereby promoting 
SPOP degradation by APC/C“"!, Loss-of-function mutations in 
SPOP compromise ubiquitination-mediated PD-L1 degradation, 
leading to increased PD-L1 levels and reduced numbers of tumor- 
infiltrating lymphocytes (TILs) in mouse tumors and in primary 
human prostate cancer specimens. Notably, combining CDK4/6 
inhibitor treatment with anti-PD-1 immunotherapy enhances tumor 
regression and dramatically improves overall survival rates in mouse 
tumor models. Our study uncovers a novel molecular mechanism 
for regulating PD-L1 protein stability by a cell cycle kinase and 
reveals the potential for using combination treatment with CDK4/6 
inhibitors and PD-1/PD-L1 immune checkpoint blockade to enhance 
therapeutic efficacy for human cancers. 

Deregulated cell cycle progression is a hallmark of human cancer, and 
targeting cyclin-dependent kinases (CDKs) to block cell proliferation 
has been validated as an effective anti-cancer therapy®. Although it 
has been reported that PD-L1 expression can be regulated at both 
transcriptional®’” and post-translational levels''!”, it remains unclear 
whether PD-LI stability is regulated under physiological conditions 
such as during cell cycle progression. We found that PD-L1 protein 
abundance fluctuated during cell cycle in multiple human cancer cell 
lines, peaking in M/early G1 phases, followed by a sharp reduction in 
late G1/S phases (Fig. 1a-d; Extended Data Fig. la-g). Elevated PD-L1 
protein abundance was also observed in multiple mouse tumor-derived 
cell lines arrested in M phase by nocodazole or taxol'? (Extended Data 
Fig. 1h-m). 

Cyclin-dependent kinases play crucial roles in regulating the 
stability of cell cycle-related proteins during cell cycle progression'*!, 


Therefore, we adopted a genetic method to ablate each major cyclin 
and found that ablating all three D-type cyclins (D1, D2 and D3), but 
not cyclin A (Al and A2) nor cyclin E (E1 and E2), strongly elevated 
PD-L1 protein abundance in mouse embryonic fibroblasts (MEFs) 
(Fig. 2a and Extended Data Fig. 2a-e). Using MEFs lacking individual 
D-type cyclins, we observed that depletion of cyclin D1, and to a lesser 
extent cyclin D2 or D3, upregulated PD-L1 protein levels (Fig. 2b, c). 
Conversely, reintroduction of cyclin D1, and to a lesser extent cyclin 
D2 or D3, suppressed PD-L1 protein abundance in cyclin DI’ D2/-D3* 
MEFs (Extended Data Fig. 2f). In further support of a physiological 
role for cyclin D1 in negatively regulating PD-L1 protein level in vivo, 
mammary tumors arising in cyclin DI” MMTV-Wnt-1 or MMTV- 
c-Myc mice displayed elevated PD-L1 protein levels, as compared to 
tumors arising in cyclin D1*'* animals (Fig. 2d and Extended Data 
Fig. 2g). 

Depletion of cyclin D catalytic partner, the cyclin-dependent kinase 4 
(CDK4) '®, but not CDK6"* nor the cyclin A and cyclin E binding-partner, 
CDK2 !’, also increased PD-L1 protein abundance in cells (Fig. 2e, f; 
Extended Data Fig. 2h-j). Conversely, ectopic expression of wild-type 
CDK4, but not kinase-dead N158F mutant, decreased PD-L1 levels 
(Extended Data Fig. 2k, 1). Furthermore, treatment of multiple cancer 
cell lines with two different selective inhibitors of CDK4/6 kinase, 
palbociclib or ribociclib®, upregulated PD-L1 protein abundance and 
stability even in pRB knock-down cells (Fig. 2g, h; Extended Data 
Fig. 2m-q). 

Rb is frequently inactivated in human cancers!*!°. In agreement 
with previous reports”°!, we found that Rb-deficient cancer cells 
often displayed high levels of cyclin D-CDK4/6 inhibitor, p16N**. 
Consistent with the notion that cyclin D1-CDK4 kinase suppresses 
PD-L1 levels, we observed that upregulation of 16NK4 correlated with 
elevated PD-L1 levels. Moreover, in Rb-proficient/p16-low cancer cell 
lines, higher PD-L1 levels correlated with relatively low CDK4 expres- 
sion (Extended Data Fig. 2r). In addition, ectopic expression of p16!N** 
in Rb-proficient/p16-low cell lines (MCEF7 and T47D) or Rb-deficient/ 
p16-low cell line (HLF) elevated PD-L1 protein abundance (Extended 
Data Fig. 2s-u), while depletion of p16'N™ in Rb-deficient/p16-high 
cell lines (MDA-MB-436, BT549, and HCC1937) had an opposite effect 
(Extended Data Fig. 2v-x), further documenting an inverse correlation 
between the CDK4 activity and PD-L1 expression. 

To extend these observations to an in vivo setting, we treated 
MMTV-ErbB2 mice bearing autochthonous breast cancers, or mice 
carrying allografts of murine MC38 or B16-F10 cancer cell lines with 
palbociclib, and monitored PD-L1 levels. Inhibition to CDK4/6 led a 
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significant upregulation of PD-L1 in all these cancer models, which was 
accompanied by a reduction in the number of infiltrating CD3* TILs 
(Fig. 2i-k; Extended Data Fig. 3a-c). We also observed that palbociclib 
treatment significantly elevated PD-L1 protein levels in various organs 
of normal mice (Extended Data Fig. 3d-h). Collectively, these results 
demonstrate that cyclin D-CDK4 kinase plays a rate-limiting role in 
regulating PD-L1 levels in vivo. 

To understand how cyclin D-CDKé4 regulates PD-L1 levels, we first 
determined that treatment of cells with proteasome inhibitor MG132, 
or with cullin-based ubiquitin E3 ligase inhibitor, MLN4924” elevated 
PD-L1 protein levels (Fig. 3a). To identify which cullin family E3 
ligase(s) regulates PD-L1, we screened the potential interaction of 
PD-L]1 with each cullin family proteins and found that Cullin 3, and toa 
lesser extent, Cullin 1 interacted with PD-L1 in cells (Fig. 3b, Extended 
Data Fig. 4a, b). These results indicate that in addition to Cullin 
1/8-TRCP"’, Cullin 3-based E3 ligase(s) might playa role in regulating 
PD-LI stability. Consistent with this notion, depletion of Cullin 3 
elevated the protein abundance of endogenous PD-L1 (Extended Data 
Fig. 4c). 

Cullin 3-based E3 ubiquitin ligases recognize their downstream 
substrates through substrate-recruiting adaptor proteins”’. We found 
that SPOP, but not other adaptor proteins examined interacted with 
PD-L1 in cells (Fig. 3c, d). We further determined that deletion of the 
C-tail, or the last eight amino acids of PD-L1 (283-290), disrupted 
binding of PD-L1 to SPOP, and rendered PD-LI1 resistant to SPOP- 
meditated poly-ubiquitination (Extended Data Fig. 4d-h), indicating 
that the 283-290 region of PD-L1 might represent the potential bind- 
ing motif for SPOP. Importantly, the cancer-derived PD-L1 T290M 
mutant (cBioPortal) located within the SPOP-binding motif also lost 
its ability to interact with SPOP and became more stable through 
decreased SPOP-mediated poly-ubiquitination and degradation 
(Extended Data Fig. 4i-l). Furthermore, depleting SPOP or deleting its 
substrate-interacting MATH domain elevated and stabilized PD-L1 in 
cells (Fig. 3e, f; Extended Data Fig. 5a-m). However, depleting known 
SPOP substrates including AR, ERG, Trim24, or DEK in SPOP-WT or 
SPOP” cells did not lead to obvious changes in PD-L1 levels (Extended 
Data Fig. 5n-u), arguing against a possibility of secondary effects for 
the observed elevation of PD-L1 upon SPOP depletion. 

SPOP mutations occur in 10-15% of human prostate cancers, and are 
largely clustered within the MATH domain**”* (Extended Data Fig. 6a). 
Notably, these cancer-derived SPOP mutants failed to promote PD-L1 
degradation due to their deficiency in binding to PD-L1 and promoting 
PD-L1 poly-ubiquitination (Fig. 3g-i and Extended Data Fig. 6b, c), 
which resembles the Elongin C-encoding TCEB1 hotspot mutants 
in clear cell renal carcinoma, resulting in deficiencies in the ability 
of Cullin 2/Elongin B/C/VHL E3 ligase complex to promote HIF1la 
degradation 26. We also observed that mutations in the PD-L1 C-tail 
(degron) are mutually exclusive with mutations in the substrate- 
interacting MATH domain of SPOP (Extended Data Fig. 6d, e). 

To further explore the impact of SPOP mutations on tumorigenesis, 
we generated tumor cell lines expressing SPOP-WT or cancer-derived 
mutants. We found that cells expressing cancer-derived SPOP mutants 
displayed elevated levels of endogenous PD-L1 protein, as compared to 
cells expressing SPOP-WT. (Fig. 3j and Extended Data Fig. 6f-j). Upon 
inoculation into immunoproficient mice, the growth of implanted 
tumors expressing cancer-derived SPOP-F102C was faster than tumors 
expressing SPOP-WT (Fig. 3k and Extended Data Fig. 6k). Tumors 
expressing cancer-derived SPOP-F102C mutant displayed elevated 
PD-L1 levels and significantly reduced numbers of CD3* TIL (Fig. 31 
and Extended Data Fig. 61). Strikingly, the difference in tumor weights 
between SPOP-WT and SPOP-F102C groups was largely alleviated 
after treatment with anti-PD-L1 antibody (Extended Data Fig. 6m-p), 
or when tumor cells were inoculated into T cell-deficient Tera~/~ mice 
(Extended Data Fig. 6q-s). Hence, enhanced tumorigenic potential of 
SPOP-mutant cells is largely caused by elevated PD-L1 levels resulting 
in increased immune evasion. 
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We next explored whether loss-of-function SPOP mutations regu- 
late PD-L1 levels or TILs in primary human prostate cancers. To this 
end, we identified 15 SPOP-mutant and 82 SPOP-wild type tumors 
through large-scale sequencing as described’””*, IHC staining results 
revealed that approximately 80% of SPOP-mutant tumors exhibited 
strong PD-L1 staining, while only approximately 10% of SPOP-WT 
tumors exhibited strong staining for PD-L1 and 70% of SPOP-WT 
cases displayed weak or no PD-L1 staining (Fig. 3m, n; Extended Data 
Fig. 7a-d). Moreover, the numbers of CD8* TILs were reduced in sam- 
ples harboring SPOP mutations, as compared to SPOP-WT tumors 
(Fig. 30; Extended Data Fig. 7e-h). These results indicate that SPOP- 
deficiency correlates with elevated PD-L1 protein abundance and 
decreased numbers of TILs in primary human prostate cancers. 

We further found that SPOP protein abundance fluctuated during 
the cell cycle and displayed an inverse correlation with PD-L1 protein 
levels (Fig. 1a and Fig. 4a); depleting SPOP resulted in stabilization of 
PD-L1 across the cell cycle (Fig. 4a and Extended Data Fig. 8a). We 
noted that the Anaphase-Promoting Complex/Cyclosome (APC/C) E3 
ligase adaptor protein Cdh1 displayed an inverse correlation with SPOP 
protein levels during cell cycle (Figs 1a, 1c and Fig. 4a). Furthermore, 
depletion of Cdh1, but not Cdc20, elevated SPOP protein abundance, 
which was accompanied by a simultaneous reduction in PD-L1 protein 
levels (Extended Data Fig. 8b, c). Consistent with these results, we 
detected a physical interaction between the endogenous SPOP and 
Cdh1 proteins (Fig. 4b and Extended Data Fig. 8d, e), and identified an 
evolutionarily conserved destruction-box motif (D-box: RxxLxxxxN) fe 
in SPOP (Extended Data Fig. 8f). Deleting the D-box motif in SPOP 
disrupted its binding to Cdh1 and rendered SPOP resistant to Cdh1- 
mediated poly-ubiquitination and degradation (Fig. 4c, d; Extended 
Data Fig. 8g-i). Moreover, depletion of Cdh1 led to SPOP stabilization, 
which subsequently resulted in a reduction in PD-L1 protein level 
during cell cycle progression (Fig. 4e). Taken together, these results 
indicate that Cdh1 is a physiologically important upstream E3 ligase 
responsible for negatively regulating SPOP protein stability. 

To elucidate how the cyclin D-CDKé4 kinase affects this mechanism, 
we established that cyclin D1-CDK4 directly phosphorylates SPOP at 
Ser6, but not Ser222, the only two conserved serine-proline sites in 
SPOP (Fig. 4f and Extended Data Fig. 9a-d). Conversely, treatment of 
cells with CDK4/6 inhibitor, palbociclib, reduced the phosphorylation 
of SPOP in cells (Extended Data Fig. 9e). We observed that 14-3-3y 
protein physically interacted with SPOP in a pSer6-dependent manner 
and disrupted the interaction of SPOP with Cdh1 in cells (Fig. 4g, h; 
Extended Data Fig. 9f-h). Inhibition of SPOP-pSer6 decreased the 
interaction of SPOP with 14-3-3+ and increased its binding to Cdh1, 
leading to elevated SPOP poly-ubiquitination (Fig. 4i; Extended Data 
Fig. 9i-p). Consequently, palbociclib treatment decreased SPOP protein 
abundance and elevated PD-L1 levels in SPOP-WT, but not SPOP- 
deficient cells (Fig. 4j). Moreover, depletion of 14-3-3 dramatically 
upregulated PD-L1 levels and stabilized PD-L1 during cell cycle 
progression (Extended Data Fig. 9q-t). 

Recent clinical studies revealed that the success of PD1/PD-L1 block- 
ade correlates with PD-L1 expression levels in tumor cells®’. Given 
our observation that inhibition of CDK4/6 elevated PD-L1 levels, we 
hypothesized that inhibitors of CDK4/6 kinase might synergize with 
anti-PD-1/PD-L1 therapy to elicit an enhanced therapeutic effect. 
Notably, we observed that treatment of immunoproficient mice bearing 
CT26 tumors with palbociclib plus anti-PD-1 antibody dramatically 
retarded tumor progression and resulted in 8 complete responses out 
of 12 treated mice (Fig. 4k; Extended Data Fig. 10a). Moreover, com- 
bining CDK4/6 inhibitor with anti-PD-1 therapy resulted in a signifi- 
cant improvement of overall survival compared to single-agent treated 
group (Fig. 41). Similar results were obtained using mice bearing tumors 
derived from MC38 cells. (Extended Data Fig. 10b, c). As expected 
from our earlier observations, treatment of tumor-bearing mice with 
palbociclib decreased the absolute numbers of TILs, including CD3", 
CD4*, CD8*, Granzyme Bt and IFN** cells. Importantly, addition of 
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anti-PD-1 antibody to palbociclib treatment restored essentially normal 
numbers of TILs (Extended Data Fig. 10d-)). 

A recent study revealed that another inhibitor of CDK4/6, 
abemaciclib, increased immunogenicity of cancer cells via an 
Rb-dependent mechanism, which activates tumor cell expression of 
endogenous retroviral elements, thereby stimulating production of 
type III interferons and antigen presentation by tumor cells*”. Together 
with our demonstration that cyclin D-CDK4 regulates PD-L1 stability 
through Cullin 3S5POP (Extended Data Fig. 10k), these studies provide 
complementary molecular rationale for combining CDK4/6 inhibitor 
treatment with anti-PD-1/PD-L1 immunotherapy to enhance tumor 
regression. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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Figure 1 | The protein abundance of PD-L1 fluctuates during cell cycle 
progression. a, c, Immunoblot (IB) analysis of whole cell lysates (WCL) 
derived from HeLa cells synchronized in M phase by nocodazole (a) or in 
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late G1/S phase by double thymidine (b) following by releasing back into 
the cell cycle. b, d, The cell-cycle profiles in (a) or (c) were monitored by 
fluorescence-activated cell sorting (FACS). 
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Figure 2 | Cyclin D-CDK4 negatively regulates PD-L1 protein stability. 
a-d, IB analysis of WCL derived from wild type versus combinational 
(cyclin D1™ /~D2-/-D3~/-) (a) or single isoform cyclin D knockout MEFs 
(b), MDA-MB-231 cells depleted cyclin D1 or cyclin D3 using shRNAs (c), 
or MMTV-Wntl induced mouse mammary tumors with/without genetic 
depletion of cyclin D1 (d). e-h, IB analysis of WCL derived from wild 
type versus cdk4~/~ MEFs (e), MDA-MB-23] cells depleted CDK4 using 
shRNAs (f), or multiple breast cancer cell lines treated with palbociclib 


(0.5, 11M) for 48 hours (g, h). i, j, Immunofluorescence staining of 

PD-L1 and CD3 in mouse mammary tumors induced by MMTV-ErbB2 
treated with vehicle or palbociclib as described in Method (i) and the 
quantification of CD3* T cell population (j). The scale bar: 50 pm. k, FACS 
analysis for PD-L1 or CD3* T-cell populations from MC38 implanted 
tumors treated with vehicle or palbociclib for 7 days. Vehicle, n = 4 for 

(i, j) or 7 mice for (k); palbociclib, n = 4 for (i, j) or 7 mice for (k). Error 
bars, + s.d., two-tailed t-test, **P< 0.01, ***P < 0.001 (two-tailed t-test). 


00 MONTH 2017 | VOL 000 | NATURE | 5 


© 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved 


LETTER 


wi Rie Se a qa ra & 
a b ESSE5EE5 C OS @rrri: $5 
>35 55 55 55 Myctag DoLL22¢ g Flagtag 3a 
x woo00000 + + + + + + + + HA-PD-L1 ga 
of% of + + + + + + + + HA-PD-L1 a] 
25 QZ + + + + + + + + MG132 a a 8 § IP 
o= o2 3 IB:Flag | IB: SPOP 
IB: PD-L1 IB: Myc 4 7 T 
Lol Ex) . — = ES "a = IB: PD-L1 
IB: Trim24. = a :* IgG 
-E=| =| wowewewe | s:HA (SST RE TTR BHA fll Hea chain 
IB: p27 
[emis i a 
rr : rs) @S : Myc | : Flag 
[eal [eee 1B: Vincutin 2 = = 2 || = | |@e@e] 1B: PD-L1 
C4-2 eee ews IB HA nonce IB: HA PC3 
oO 
293T 293T e 8 5 
-a oO (oj i > = 5 FA 
e f_ aad 9g pLenti-Ha-SPOP os lige © if = Flag-SPOP 
2000 aS Bee Flag-SPOP + + + + + HA-PD-L1 
Spop #22604 oo mei s rag + + + + + His-Ubi 
W+-/- 8 Dao BY © + + + + + HA-PD-L1 + + + + + MG132 
[=a]: ep zE Le + + + + + MG132 3 
[Es ie: trimas co PD-LI $ rey. IB: HA 
IB: Trim24 . 
ene IB: : 2 
= |IB: SPOP a IBFlag IB: HA 
[= __]IB: SPOP [=a Vinculin 5 o 
=) |B: Vinculin 9 IB: HA = | «oor IB: Flag 
MEFs [-—=—=| IB: Vinculin 4-2 293T 
C4-2 PC3 
j k | wRE n *k 
_ = 1 
o 2 1.5: 2 S 160) . it B Strong 
3 Ps 3 B 80 
- g = P 120 £ O Intermediate 
S E 1.0 ?3 be o= 60 Weak 
5 5 ° & 0 got % % 
£ £ a. rs) Bi Negative 
5 = le oo ae” 2° 40 
cr ee 
5 g | + 3° ; & 20 
o 0.0 2 0 0 
SPOP WT F102C SPOP WT F102C SPOP WT F102C SPOP Mutation: WT 
(n=15) (n=82) 


SPOP WT 
i Ae 


CD8 


Figure 3 | Cullin 35?°? is the physiological E3 ubiquitin ligase for PD- 
L1. a-d, IB analysis of WCL derived from C4-2 cells treated with MG132 
(101M) or MLN4924 (141M) for 12 hours (a), immunoprecipitates (IP) 
and WCL derived from 293T cells transfected with indicated constructs 
(b, d), or anti-PD-L1 IP and WCL derived from PC3 cells (e). Cells were 
treated with MG132 (101M) for 12 hours in b, c. e-g, IB analysis of WCL 
derived from Spop*t’* versus Spop-/~ MEFs (e), C4-2 cells depleted SPOP 
with sgRNAs (f), or C4-2 cells stably expressing indicated SPOP WT 
and mutants (g). h, i, IB analysis of IP and WCL derived from 293T cells 
(h), or Ni-NTA pull-down products derived from PC3 cells transfected 
with indicated constructs and treated with 301M MG132 for 6 hours. 
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j-l, FACS analysis for PD-L1 (j) or CD3* T-cell population (1) of the 
B16-F10 implanted tumors ectopically expressing SPOP-WT or F102C 
mutant (n= 6 mice each group). Tumor weight were recorded at the time 
of sacrifice (k) (n=5 mice each group). m, Representative images of 
PD-L1 and CD8 immunohistochemistry (IHC) staining in SPOP wild- 
type or mutant primary human prostate cancer samples. The scale bar: 
400 1m or 100 p.m. n, o Quantification of IHC analysis for PD-L1 (n) 
and CD8* T cells (0) in SPOP wild-type versus mutant human prostate 
tumor specimens. (n =15 for SPOP mutant, n= 82 for SPOP WT). Error 
bars, + s.d., two-tailed t-test, except (n) Mann-Whitney test, *P < 0.05, 
** P< 0.01, ***P < 0.001. 
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Figure 4 | Cyclin D-CDK4-mediated phosphorylation of SPOP 
stabilizes SPOP largely through recruiting 14-3-3- to disrupt its 
binding with Cdh1. a-e, IB of WCL derived from HeLa cells with/without 
depletion of SPOP (a) or Cdh1 (e) synchronized in M phase by nocodazole 
treatment prior to releasing for the indicated times, IP and WCL derived 
from MDA-MB-231 (b) or 293T (c) cells, or Ni-NTA pull-down products 
derived from HeLa cells transfected with the indicated constructs (d). 
Cells were treated with MG132 (301M) for 6 hours in b-d. f, In vitro kinase 
assays showing that cyclin D1/CDK4 phosphorylates recombinant SPOP 
at Ser6, not Ser222. g-j, IB analysis of IP and WCL derived from 293T cells 
transfected with indicated constructs and treated with MG132 (101M) or 


Days after CT26 tumor cells injection 


with/without palbociclib (11M) for 12 hours (g-i), or HeLa cells 
with/without depletion of SPOP treated with palbociclib (0.5, 11M) for 
48 hours (j). k, CT26 implanted tumor-bearing mice were enrolled in 
different treatment groups as indicated. Tumor volumes of mice treated 
with control antibody (n= 13), anti-PD-1 mAb (n= 14), the CDK4/6 
inhibitor, palbociclib (n = 12) or combined therapy (n= 12) were 
measured every three days and plotted individually. We repeated this 
experiment twice. 1, Kaplan-Meier survival curves for each treatment 
group demonstrate the improved efficacy of combining PD-1 mAb with 
the CDK4/6 inhibitor, palbociclib. ***P < 0.001. (Gehan-Breslow-Wilcoxo 
test). We repeated this experiment twice. 
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METHODS 

Cell culture, transfections and viral infections. HEK293T, HEK293, HeLa, 
MDA-MB-231, MCE7, Hs578T, WT MEFs, cyclin D1“ MEFs, cyclin D2~/~ MEFs, 
cyclin D3~‘~ MEFs, cyclin DI~/-D2~/-D3~/~ MEBs, cyclin DI“"D2-/- D3", 
Cdk4*'* and Cdk4~/~ MEFs, cyclin Al*/+A2*/* and cyclin Al~/~A2~/~ MEFs, 
cyclin EI*'*E2*'* and cyclin El~/-E2~/~, Spop*'* and Spop~/~ MEFs (a kind gift of 
Dr. Nicholas Mitsiadesa, Baylor College of Medicine, Houston, TX) were cultured 
in DMEM medium supplemented with 10% FBS (Gibco), 100 units of penicillin 
and 100,:g/ml streptomycin (Gibco). HLF, HepG2, Huh] and Huh7 were cultured 
in RPMI medium supplemented with 10% FBS. MDA-MB-231 PD-L1 WT and 
PD-L1 KO cells are kind gift from Dr. Mien-Chie Hung. BT549, T47D, ZR75-1, 
HCC1954, HCC1937, MDA-MB436, MDA-MB468 and SKBR3 cells were from 
Dr. Alex Toker laboratory at BIDMC, Harvard Medical School, and cultured in 
RPMI medium or McCoy’s5A (Corning, NY) medium supplemented with 10% 
FBS. PC3, DU145, 22RV1, LNCaP and C42 were kind gifts from Dr. Pier Paolo 
Pandolfi group at BIDMC, Harvard Medical School, and cultured in RPMI medium 
(Corning, NY) with 10% FBS. Mouse tumor derived MC38 cell line was a kind 
gift from Dr. Arlene Sharpe at Harvard Medical School. Mouse tumor derived 4T1 
and B16-F10 cell lines were routinely cultured in Gordon Freemans laboratory in 
DMEM medium supplemented with 10% FBS (Gibco), 100 units of penicillin and 
100 1g/ml streptomycin (Gibco). All cell lines were routinely tested to be negative 
for mycoplasma contamination. 

Cells with 80% confluence were transfected using lipofectamine plus reagents in 
Opti- MEM medium (Invitrogen). 293FT cells were used for packaging of lentiviral 
and retroviral cDNA expressing viruses, as well as subsequent infection of various 
cell lines were performed. Briefly, medium with secreted viruses were collected 
twice at 48 hours and 72 hours after transfection. After filtering through 0.45 1M 
filters, viruses were used to infect cells in the presence of 4 \1g/mL polybrene 
(Sigma-Aldrich). 48 hours post-infection, cells were split and selected using hygro- 
mycin B (200j1g/mL) or puromycin (1 1g/mL) for 3 days. Cells were harvested and 
lysed in EBC buffer (50 mM Tris pH 7.5, 120mM NaCl, 0.5% NP40) supplemented 
with protease inhibitors (Roche) and phosphatase inhibitors (Calbiochem) for 
immunoblot analysis. 

Reagents. Nocodazole (M1404) and Taxol (T7402) were purchased from Sigma. 
Thymidine (CAS: 50-89-5) and cycloheximide (66-81-9) were purchased from 
Acros organics. Palbociclib (PD0332991, $1116) and ribociclib (LEEO11, $7440) 
were purchased from Selleckchem. MG132 (BML-PI102-0005) was purchased 
from Enzo life science. MLN4924 was a kind gift from Dr. William Kaelin (Dana- 
Farber cancer institute). 

Plasmids. Myc-tagged Cullin 1, Cullin 2, Cullin 3, Cullin 4A, Cullin 4B, Cullin 5, 
Flag-tagged SPOP WT, Y87C, F102C, W131G, delta MATH, delta BTB, pLenti- 
HA-SPOP WT, Y87C, F102C, W131G, pGEX-4T-1-SPOP, Flag-Keap1, Flag-Cop1, 
shScramble, shCullin 3, shSPOP, and His-ubiquitin constructs were described 
previously? | shAR, shERG, shTrim24, shDEK and sgSPOP constructs were 
described previously*!**, Myc-Cullin 7 construct was kindly offered by Dr. James 
A. DeCaprio (Dana-Farber Cancer Institute). KLHL2 and KLHL3 constructs were 
generous gifts from Dr. Shinichi Uchida (Tokyo Medical and Dental University). 
KLHL12 and KLHL37 constructs were purchased from Addgene. KLHL20 
construct was offered by Dr. Ruey-Hwa Chen (Institute of Biological Chemistry, 
Academia Sinica, Taiwan). The construct of HA-PD-L1 (HA tag in the N-terminus 
of PD-L1) was kindly provided by Dr. Mien-Chie Hung (The University of Texas 
MD Anderson Cancer Center). HA-Cdh1, HA-Cdc20, shCdh1, hCdc20 and 
HA-14-3-3 isoform constructs were described previously****, pCMV-CDK4 WT, 
pCMV-CDK4 N158F and shcyclin D3 were described previously****. pBabe-p16 
was a kindly gift from Dr. Charles J. Sherr laboratory. pLKO-shCDK4 (Plasmid 
#78153 and #78154) and pMLP-shCDK6 (Plasmid #73552 and #73553) were 
purchased from Addgene. pLKO-sh14-3-3y (TRCN0000078160, 
TRCN0000078161, TRCN0000078162), pLKO-shp16 (TRCN0000039748, 
TRCN0000039751, TRCN0000039782) and pLKO-shCD8a (TRCN0000057583, 
TRCN0000057587) were purchased from Open Biosystems. pcDNA3-PD-L1, 
pCMV-GST-PD-L1-tail (cytoplasmic amino acids), HA-PD-L1-AC-tail, HA-PD- 
L1-A283-290, HA-PD-L1-S283A, HA-PD-L1-S285A, HA-PD-L1-T290M, 
pLenti-PD-L1 WT, pLenti-PD-L1-A283-290, pLenti-PD-L1 T290M, pET-28a- 
His-SPOP WT, pET-28a-His-SPOP S6A, pET-28a-His-SPOP S22A, Flag-SPOP 
with delta D-Box (RxxL), pLenti-HA-c-Myc WT, pLenti-HA-c-Myc T58A/S62A, 
pLenti-HA-cyclin D1, pLenti-HA-cyclin D2, pLenti-HA-cyclin D3, Flag-SPOP 
S6A, HA-tagged CDK2, CDK4 and CDK6 were generated in this study. 
Antibodies. Anti-PD-L1 (E1L3N) rabbit mAb (13684), anti-pS10-H3 (3377), 
anti-pS780-Rb (8180), anti-pS807/811-Rb (8516), anti-Rb (9309), anti-cyclin D1 
(2978), anti-cyclin D2 (3741), anti-CDK6 (3136), anti-cullin 3 (2759), anti-GST 
(2625), rabbit polyclonal anti-Myc-Tag antibody (2278) and mouse monoclonal 
anti- Myc-Tag (2276) antibodies were purchased from Cell Signaling Technology. 
Mouse PD-L1 antibody (MAB90781-100) was purchased from R&D systems. 


Anti-mPD-L1 for immunoblotting (clone 298B.8E2), anti-mPD-L1 (clone 
298B.3G6) for immunohistochemistry, and anti-human PD-L1 for immuno- 
precipitation (clone 29E.12B1) were generated in the laboratory of Dr. Gordon 
J. Freeman. Anti-CDK4 (MS-616-P1) was purchased from Thermo Scientific. 
Anti-SPOP (16750-1-AP) was purchased from Proteintech. Anti-cyclin A (sc-751), 
anti-cyclin B (sc-245), anti-cyclin E (SC-247), anti-cyclin D3 (sc-182), anti-Cdh1 
(sc56312), anti-Cdc20 (sc-8358), anti-Cdc20 (sc-13162), anti-Plk1 (sc-17783), 
anti- TRIM24 (TIFla, SC-271266), anti-HA (sc-805, Y-11), anti-PD-L1 (sc-50298) 
and anti-GST (sc-459) were obtained from Santa Cruz. Anti-GFP (8371-2) was 
purchased from Clontech. Anti-Flag (F-2425), anti-Flag (F-3165, clone M2), 
anti-Vinculin (V9131), anti-Flag agarose beads (A-2220), anti-HA agarose beads 
(A-2095), peroxidase-conjugated anti-mouse secondary antibody (A-4416) and 
peroxidase-conjugated anti-rabbit secondary antibody (A-4914) were purchased 
from Sigma. Anti-HA (MMS-101P) was obtained from BioLegend. 
Immunoblot and immunoprecipitation analyses. Cells were lysed in EBC buffer 
(50mM Tris pH 7.5, 120mM NaCl, 0.5% NP-40) supplemented with protease 
inhibitors (Complete Mini, Roche) and phosphatase inhibitors (phosphatase 
inhibitor cocktail set I and II, Calbiochem). Protein concentrations were measured 
by the Beckman Coulter DU-800 spectrophotometer using the Bio-Rad protein 
assay reagent. Equal amounts of protein were resolved by SDS-PAGE and immu- 
noblotted with indicated antibodies. For immunoprecipitations analysis, 1000 1g 
total cell lysates were incubated with the primary antibody-conjugated beads for 
4hours at 4°C. The recovered immunocomplexes were washed four times with 
NETN buffer (20 mM Tris, pH 8.0, 100 mM NaCl, 1 mM EDTA and 0.5% NP-40) 
before being resolved by SDS-PAGE and immunoblotted with indicated antibodies. 
Immunohistochemistry (IHC) for cell pellets, xenografted tumors or human 
prostate tumor specimens. The cultured cells (MDA-MB-231 PD-L1 WT and 
KO cells; HBP-ALL shScr and shCD8 cells; KE37 shScr and shCD8 cells) were 
washed and fixed in 4% paraformaldehyde for 20 minutes. Cells pellets or xeno- 
grafted (MDA-MB-231 PD-L1 WT or KO) tumors were embedded into TFM and 
frozen. After cryostat sections (101M) were placed on Superfrost Plus Stain slides, 
samples were then permibilized in 0.1% Triton X-100/PBS for 10 minutes. For IHC 
analysis, we used UltraSensitive™ SP (Mouse) IHC Kit (KIT-9701, Fuzhou Maixin 
Biotech) following the manufacturer's instructions with minor modification. The 
sections were incubated with 3% H2O) for 15 min at room temperature to block 
endogenous peroxidase activity. After incubating in normal goat serum for 1 hour 
to block non-specific binding of IgG, sections were treated with primary antibody 
(PD-L1, 298B.3G6, 18 jig/ml; CD8q, sc-53212, clone C8/144B, dilution 1:40) at 4°C 
overnight. Sections were then incubated for 30 minutes with biotinylated goat- 
anti-mouse IgG secondary antibodies (Fuzhou Maixin Biotech), followed by incuba- 
tion with streptavidin-conjugated HRP (Fuzhou Maixin Biotech). Specific samples 
were developed with 3’3-diaminobenzidine (DAB-2031, Fuzhou Maixin Biotech). 
Images were taken using an Olympus microscopic camera and matched software. 

The prostate tumor specimens were obtained from Shanghai Changhai Hospital 
in China. Usage of these specimens was approved by the Institute Review Board 
of Shanghai Changhai Hospital. For IHC, the paraformaldehyde fixed paraffin 
embedded prostate tumor samples were deparaffinized in xylene (3 x 10 min), 
rehydrated through a series of graded alcohols (100%, 95%, 85%, and 75%) to 
water. Samples were then subjected to heat-mediated antigen retrieval at 95 °C for 
20 min. The following IHC steps were the same as described above. 

The expression level of PD-L1 in prostate cancer tumor samples was determined 

according to the intensity of the staining as 0, negative; 1, weak expression; 2, inter- 
mediate expression and 3, strong expression. The numbers of intraepithelial CD8* 
tumor-infiltrating T lymphocytes (TILs) was counted as described in Hamanishi 
et al°’, Briefly, three independent areas with the most abundant infiltration were 
selected under a microscopic field at 200 x magnification (0.0625mm7). The 
number of intraepithelial CD8* TILs was counted manually and calculated as 
cells per mm”. The Mann-Whitney test was used to compare the difference in 
PD-L1 expression between SPOP mutated and wide type cases. The Student's t 
test was used to determine P values of the difference in CD8* TILs between SPOP 
mutated and wide type cases. P< 0.05 was considered as significant. 
In vitro cyclin D/CDK4 kinase assays. Kinase assays were performed in a final 
volume of 30 i] of a kinase buffer as described previously**: 50 mM HEPES 
(pH 7.5), 10mM MgCh, 1mM DTT, 1mM EGTA, 0.1 mM Naf, containing 101M 
ATP and 0.4 mCi [*’P]yATP (Perkin Elmer). 0.2 jug of CDK4/cyclin D1 (0142- 
0143-1, Pro-Qinase), CDK4/cyclin D2 (0142-0375-1, Pro-Qinase), or CDK4/cyclin 
D3 (0142-0373-1, Pro-Qinase) were used as kinases. 2 |1g of His-SPOP, His-SPOP- 
S6A, His-SPOP-S222A, or His-SPOP-S6A/S222A mutant proteins immobilized on 
Ni-NTA beads were used as kinase substrates. 0.1 1g of Rb1 C-terminal recombi- 
nant protein (Cat. SC-4112, Santa Cruz) was used as a positive control for kinase 
assays. 21g of BSA was used as a negative control. After 60 min incubation at 
30°C, proteins were denatured, resolved on SDS-PAGE, transferred to nitrocellu- 
lose membranes and exposed to X-ray films. 
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Kinase assay for Rb by immune-purified endogenous CDK4/cyclin D1 from 
mice tissues. For endogenous kinase assays, the endogenous CDK4 was immu- 
noprecipitated by 61g of anti-CDK4 Ab (Santa Cruz, sc-23896 AC) from 2.5 mg 
of lysates (buffer: 20 mM Tris-HCl pH 8.0, 0.1 M KCl, 5mM MgCh, 10% glycerol, 
0.1% Tween-20, 0.1% NP40) of livers or brains isolated from C57BL6 mice. The 
association of cyclin D1 was also confirmed by cyclin D1 antibodies (abcam, 
ab134175). The immunopurified endogenous CDK4/cyclin D1 was used as 
kinase, and 0.5 1g of Rb1 C-terminal recombinant protein (SC-4112) was used 
as the kinase substrate. 

In vivo ubiquitination assays. PC3 or HeLa cells with 80% confluence were 
transfected with His-ubiquitin and the indicated constructs. 36 hours post- 
transfection, cells were treated with 301M MG132 for 6 hours and lysed in buffer 
A (6M guanidine-HCl, 0.1 M NazHPO,/NaH»POu,, and 10 mM imidazole [pH 
8.0]). After sonication, the lysates were incubated with nickel-nitrilotriacetic acid 
(Ni-NTA) beads (QIAGEN) for 3 hours at room temperature. Subsequently, the 
His pull-down products were washed twice with buffer A, twice with buffer A/TI 
(1 volume buffer A and 3 volumes buffer TI), and one time with buffer TI (25mM 
Tris-HCl] and 20 mM imidazole [pH 6.8]). The pull-down proteins were resolved 
by 2 x SDS-PAGE for immunoblotting. 

Protein half-life assays. Cells were transfected or treated under indicated 
conditions. For half-life studies, cycloheximide (20,1g/ml, Sigma) was added to 
the medium. At indicated time points thereafter, cells were harvested and protein 
abundances were measured by immunoblot analysis. 

Cell synchronization and FACS analyses. Cells synchronized with nocodazole 
arrest and double thymidine treatment as described previously”. Cells synchro- 
nized with nocodazole or double thymidine-arrest and release were collected at the 
indicated time points and stained with propidium iodide (Roche) according to the 
manufacturer's instructions. Cells were fixed by 70% ethanol at -20°C overnight 
and washed 3 times using cold PBS. The samples were digested with RNase for 
30 minutes at 37°C and stained with propidium iodide (Roche) according to the 
manufacturer's instructions. Stained cells were sorted with BD FACSCanto™ II 
Flow Cytometer. The results were analyzed by ModFit LT 4.1 and FSC express 5 
softwares. 

BrdU/PI labelling and FACS analyses. Cells were incubated with/without BrdU 
(751M, Sigma) containing medium for 1 hour. Cells were harvested and washed 
once with cold PBS for centrifuge 5 min at 1200 rpm. Cells were re-suspended 
in 200,11 cold PBS and added in 5 ml of cold 90% ethanol for fixation overnight. 
After centrifuge 5 min at 1200 rpm, cells were washed once using 5 ml PBS and 
added in 0.5 ml 2N HCI-0.5% Triton X-100 for 30 min at room temperature (RT). 
After adding in 5 ml PBS, samples were centrifuged for 5 min at 1200 rpm and 
re-suspended in 1 ml Na2B407 (pH 8.5). Samples were re-suspended in 200 1l 
of anti-BrdU diluted (1: 40) in PBS with 0.5% tween 20 and 1% BSA and were 
incubated 30 min at room temperature. After adding in 5 ml 20mM Hepes-PBS 
(pH7.4) with 0.5% tween 20, samples were centrifuged for 5 min at 1200 rpm and 
were re-suspended in 0.5 ml PBS with propidium iodide (PI, 5 1g/ml, Sigma) and 
RNAse A (200,1g/ml, Roche). After incubating 30 min at RT, samples were trans- 
fered into FACS tube and analyzed by flow cytometry. 

Real-Time RT-PCR analyses. Total RNAs were extracted using the QIAGEN 
RNeasy mini kit, and reverse transcription reactions were performed using the 
ABI Taqman Reverse Transcription Reagents (N808-0234). After mixing the 
generated cDNA templates with primers/probes and ABI Taqman Fast Universal 
PCR Master Mix (4352042), reactions were performed with the ABI-7500 Fast 
Real-time PCR system and SYBR green qPCR Mastermix (600828) from Agilent 
Technologies Stratagene. 

Human GAPDH: Forward, 5'-GGAGCGAGATCCCTCCAAAAT-3', 

Reverse, 5'-GGCTGTTGTCATACTTCTCATGG-3; 

Mouse GAPDH: Forward, 5'-AGGTCGGTGTGAACGGATTTG-3’', 

Reverse, 5'-GGGGTCGTTGATGGCAACA-3'; 

Human PD-L1: Forward, 5'-TGGCATTTGCTGAACGCATTT-3', 

Reverse, 5'- TGCAGCCAGGTCTAATTGTTTT-3; 

Mouse PD-L1: Forward, 5'-GCTCCAAAGGACTTGTACGTG-3’, 

Reverse, 5'- TGATCTGAAGGGCAGCATTTC-3;; 

Generation of cyclin D-deficient MEFs. Cyclin DI~/~, D2-/~, D3 and D1"*D2~/- 
D3"/™MEFs were derived from E13.5 mouse embryos as described previously**". 
Generation of mouse tumors. Cyclin D1~/~ mice’? were mated with MMTV- 
c-Myc or MMTV-Wht1 mice (from the Jackson Laboratory) yielding cyclin 
D1-/-IMMTV-c-Myc or cyclin D1~/-/MMTV- Wit, as well as control cyclin 
D1*'*/MMTV-c-Myc or D1*/*/MMTV-Wnt1 mice. Mammary tumors were 
dissected from multiparous females and snap-frozen. 

MMTV-ErbB2 female mice (from the Jackson Laboratory), bred into a mixed 
C57BL/6 and 129Sv background, were treated with palbociclib or vehicle only for 
6 weeks after detection of palpable tumors. Palbociclib was administered daily by 
gastric gavage (150 mg/kg of body weight); every two weeks the daily dose was 
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lowered to 100mg/kg for 2-3 days. Control mice were treated with vehicle (10% 
0.1N HCl, 10% Cremaphor EL, 20% PEG300, 60% 50mM citrate buffer pH 4.5) 
10 ml/kg by gastric gavage. After 6 weeks, tumors were collected and snap-frozen 
in OCT. 

Treatment of wild-type mice with palbociclib. 6-weeks old C57BL/6 female mice 
(from the Jackson Laboratory) were treated with palbociclib (150 mg/kg body 
weight, by gastric gavage) or vehicle only for 7 days. Subsequently, organs were 
collected and analyzed by immunoblotting. 

Mouse tumor implantation. 1 x 10° B16-F10 or 2 x 10° MC38 cells were injected 
subcutaneously into 6-weeks old C57BL/6 female mice (from the Jackson 
Laboratory). Starting one week later, mice were treated daily with palbociclib 
(150 mg/kg body weight, by gastric gavage) or vehicle only, for 7 days. Subsequently, 
tumors were collected and analyzed by FACS or immunoblotting. 

1 x 10° B16-F10 cells stably expressing SPOP WT or F102C mutant were 
injected subcutaneously into 6-weeks old C57BL/6 female mice (from the Jackson 
Laboratory). On day 3 after tumor cells were injected, control and PD-L1 mAb 
treatments were conducted by intra-peritoneal injection (200 1g/mouse in 200 1l 
HBSS saline buffer) every three days for a total of 3 injections. Subsequently, 
tumors were collected and analyzed by FACS. 

1 x 10° B16-F10 cells stably expressing SPOP WT or F102C mutant were 
injected subcutaneously into 6-weeks old Tera~/~ female mice (from the Jackson 
Laboratory). After 10 days, tumors were collected and analyzed by FACS. 
Immunofluorescence staining of cells or tumor tissues. MDA-MB-231 PD-L1 
WT and KO cells were seeded in chambers (154534, Thermo Fisher Scientific). 
Cells were fixed with 4% paraformaldehyde for 20 minutes, followed with 0.1% 
Triton X-100 in PBS for 10 minutes. Cells were pre-blocked with 2% BSA/PBS 
for 45 minutes, then incubated with primary antibodies against PD-L1 (PD-L1, 
298B.3G6, 1:200), for 2.5 hours at room temperature and followed with secondary 
anti-mouse antibodies conjugated with Alexa-fluor-568 (Invitrogen, 1:250). 
Hoechst (life technology, 1:10,000) was used to stain nuclei. 

TFM-embedded 10,:M-thick tumor tissue sections were fixed with 2% para- 
formaldehyde/PBS for 30 min, and permeabilized in 0.1% Triton X-100/PBS for 
10min. Tumor tissue sections were pre-blocked with 2% BSA/PBS for 45 min, then 
incubated with primary antibodies against PD-L1 (1:200), CD3 (Abcam, 1:250) for 
2.5 hours at room temperature and followed with secondary anti-mouse antibodies 
conjugated with Alexa-fluor-568 (Invitrogen, 1:250) and anti-rabbit antibodies 
conjugated with Alexa-fluor-488 (Invitrogen, 1:250). Hoechst (life technology, 
1:10,000) was used to stain nucleus. Tumor tissues were mounted with fluoro- 
mount-G® (SouthernBiotech) at 4°C overnight. Tissue sections were examined 
with fluorescent microscope under a 20 x objective lens. CD3* cell numbers were 
counted in an area of 5.95 x 10°|1m?. 

Single cell generation from tumor tissue and flow cytometry analysis. Tumor 
tissues were minced and digested with 5 ml of 2 mg/ml collagenase (Sigma) in 
DMEM for 1 hour at 37°C. Cells were then collected by centrifuge and filtered 
through a 70j.m strainer in DMEM. Cell pellets were suspended and lysed in 
red blood cell lysis buffer for 5 min. The cells were then filtered through a 401m 
strainer in 1 x PBS with 2% BSA. 1 million cells were incubated with antibodies 
against PD-L1 (564715, BD Biosciences, 1:100) conjugated with APC or antibodies 
against CD3 (Biolegend, 1:100) conjugated with APC or corresponding isotype 
IgG1 control at room temperature for 30 min. Cells were washed by 1 x PBS with 
2% BSA and analyzed by flow cytometry. 

In vivo experimental therapy in MC38 and CT26 mice tumor models. Animal 
studies were approved by Dana-Farber Cancer Institute Institutional Animal 
Care and Use Committee (IACUC; protocol number 04-047), and performed 
in accordance with guidelines established by NIH Guide for the care and use 
of laboratory animals. MC38 or CT26 tumors were established by subcutane- 
ously injecting 1 x 10° MC38 or CT26 tumor cells in 100,11 HBSS into the right 
flank of 6-week old C57BL/6 or BALB/c female mice (Jackson Lab, ME). Tumor 
sizes were measured every three days by caliper after implantation and tumor 
volume was calculated by length x width? x 0.5. On day 7 after tumor cells were 
injected, animals were pooled and randomly divided into four groups with com- 
parable average tumor size. Moreover, the lab members who measured the mice 
were blinded to the treatment groups. Mice were grouped into control antibody 
treatment, PD-1 mAb treatment, CDK4/6 inhibitor treatment, and PD-1 mAb 
plus CDK4/6 inhibitor treatment. As illustrated in Extended Data Fig. 10a, control 
and PD-1 mAb treatments were conducted by intraperitoneal injection (200 j1g/ 
mouse in 200,11 HBSS saline buffer) every three days for a total of 8 injections. The 
CDK4/6 inhibitor treatment was given by oral gavage once a day with a dosage 
of 100 mg/kg for three weeks with a break every week for one day. For survival 
studies, animals were monitored for tumor volumes every three days for 120 days 
after initial treatment, until tumor volume exceeded 2000 mm*, or until tumor 
had ulcer with diameter reached 1 cm. Statistical analysis was conducted using the 
GraphPad Prism software (GraphPad Software, Inc., San Diego, CA). Kaplan-Meier 
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curves and corresponding Gehan-Breslow-Wilcoxo tests were used to evaluate the 
statistical differences between groups in survival studies. P< 0.05 was considered 
to be significant. 

T cell analysis for MC38 implanted tumors. MC38 implanted tumors were 
established by subcutaneously injecting 1 x 10° of MC38 cells were injected into 
the right flank of 6 week old C57BL/6 female mice (Jackson Lab). On the day of 
the tumor cells injected, mice were randomly divided into four groups: control 
antibody treatment, PD-1 mAb treatment, CDK4/6 inhibitor treatment, and 
PD-1 mAb plus CDK4/6 inhibitor treatment. Control and PD-1 mAb treatments 
were conducted by intraperitoneal injection (200 1g/mouse in 200 1l HBSS saline 
buffer) every three days for a total of 4 injections. The treatment of palbociclib 
was given by oral gavage with the dosage of 200 mg/kg for 9 days, with a break 
after 7 days. Tumors were then collected and single cell was generated from tumor 
tissues as described in section “Single Cell Generation from Tumor Tissue and Flow 
Cytometry analysis”. After cells were filtered through 401m strainer, cells were 
fixed in 0.5 ml/tube Fixation buffer (420801, Biolegend) in the dark for 20 minutes 
at room temperature. Cells were then washed with 1 x PBS with 2% BSA. The 
fixed cells were suspended in Intracellular Staining Perm Wash Buffer (421002, 
Biolegend) after centrifuge for two times to permeabilize the cells. Cells were then 
co-stained with antibodies against CD3 (100236, APC conjugated, Biolegend), 
Granzyme B (515403, FITC conjugated, Biolegend), IFN-7y (505808, PE conjugated, 
Biolegend) to check the activities of T cells. Or cells were co-stained with antibodies 
against CD3 (100236, APC conjugated, Biolegend), CD4 (100510, FITC conjugated, 
Biolegend), CD8 (100708, PE conjugated, Biolegend). The corresponding 
isotype IgG1 controls were used for controls. The cells were incubated with 
corresponding antibodies for 30 minutes at room temperature. Cells were washed 
by 1 x PBS with 2% BSA and analyzed by flow cytometry. 

Data Availability. Source data for gels in Figs 1-4 and Extended Data Figs 1-9 
are available in Supplementary Fig. 1. Source data for Figs 2) and 2k are available 
in Table 1. Source data for Figs 3j-n, 30 are available in Table 2. Source data for 


Fig. 4k, ] are available in Table 3. Source data for Extended Data Figs 3c, d are 
available in Table 4. Source data for Extended Data Figs 6j, |, 0, p, 1, s are available in 
Table 5. Source data for Extended Data Figs 10b-j are available in Table 6. All other 
data supporting the findings of this study are available from the corresponding 
author upon a reasonable request. 
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Extended Data Figure 1 | PD-L1 fluctuates during cell cycle 
progression. a, b, Immunoblot (IB) of whole cell lysates (WCL) derived 
from MDA-MB-231 or HCC1954 cells synchronized in M phase by 
nocodazole treatment prior to releasing back into the cell cycle for the 
indicated times. c, d, Quantitative real-time PCR (qRT-PCR) analyses of 
relative mRNA levels of PD-L1 and GAPDH from samples derived from 
HeLa cells synchronized in M phase by nocodazole treatment prior to 
releasing back to the cell cycle for the indicated time points. e, IB of WCL 
derived from HeLa cells pre-treated with/without IFNy (10 ng/ml) for 

12 hours and then synchronized in M phase by nocodazole treatment prior 
to releasing back into the cell cycle for the indicated times. f, IB of WCL 


4T1 


derived from HeLa cells stably expressing HA-c-Myc WT, or HA-T58A/ 
S62A-c-Myc as well as empty vector (EV) as a negative control. g, IB of 
WCL derived from HeLa cells with/without stably expressing HA-c-Myc 
WT synchronized in M phase by nocodazole treatment prior to releasing 
back into the cell cycle for the indicated times. h-j, IB of WCL derived 
from MC38, CT26, 4T1, or B16-F10 mouse tumor cells treated with the 
indicated concentration of nocodazole for 20 hours before harvesting. 
(k-m) IB of WCL derived from B16-F10, 4T1, or CT26 mouse tumor 
cells treated with the indicated concentration of taxol for 20 hours before 
harvesting. 
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Extended Data Figure 2 | See next page for caption. 
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Extended Data Figure 2 | Cyclin D/CDK4 negatively regulates PD-L1 
protein stability. a, b, Immunoblot (IB) analysis of whole cell lysates 
(WCL) derived from wild type (WT), cyclin Al-/-A2~/~ or WT, cyclin 
E1-/-E2-~ MEBs. c, Quantitative real-time PCR (qRT-PCR) analysis 

of relative mRNA levels of PD-L1 from wild type MEFs and cyclin 
D1~’-D2~-/-D3~“~ MEEs. Data were represented as mean +s.d,n=5. 

d, Cell cycle profiles for WT and cyclin DI~/-D2~/-D3~/~ MEFs, which 
were labeled with BrdU and analyzed by FACS. e, IB analysis of WCL 
derived from cyclin DU”"'D2-/- D3" MEFs with or without depleting 
cyclin D1 and cyclin D3 by pLenti-Cre via viral infection (pLenti-EGFP 
as a negative control), selected with puromycin (1 j1g/ml) for 72 hours 
before harvesting. f, IB analysis of WCL derived from cyclin D1~/~ D2 
~/-D3-/~ MEFs stably reintroducing cyclin D1, cyclin D2, or cyclin D3, 
respectively, with empty vector (EV) as a negative control. g, IB analysis of 
WCL derived from mouse mammary tumors induced by MMT V-c-Myc 
with/without genetic depletion of cyclin D1. n=5 mice per experimental 
group. h, IB analysis of WCL derived from WCL derived from wild type 
and cdk6~“ MEBs. i, j, IB analysis of WCL derived from MDA-MB-231 


LETTER 


cells stably expressing shCDK6 or shCDK2 as well as shScr as a negative 
control, respectively. k, 1, IB analysis of WCL derived from MDA-MB-231 
cells transfected with indicated constructs (k) and the intensity of PD-L1 
band was quantified by the Image] software (1). m, IB analysis of WCL 
derived from MDA-MB-231 cells depleted of Rb (with shScr as a negative 
control) treated with the CDK4/6 inhibitor, palbociclib, where indicated. 
n, 0, IB analysis of WCL derived from mouse CT26 or 4T1 tumor cell lines 
treated with or without the CDK4/6 inhibitor, palbociclib or ribociclib, 
respectively. p, q, IB analysis of WCL derived from MDA-MB-231 cells 
pre-treated with palbociclib (11M) for 36 hours before treatment with 
cycloheximide (CHX) for the indicated time points (p) and PD-L1 protein 
abundance was quantified by the Image] and plotted as indicated (q). r, IB 
analysis of WCL derived from 19 different cancer cell lines with indicated 
antibodies. s-u, IB analysis of WCL derived from MCF7, T47D or HLF 
stably expressing p16 as well as EV as a negative control. v-x, IB analysis of 
WCL derived from MDA-MB-436, BT549 or HCC1937 stably expressing 
three independent shRNAs against p16 as well as shScr as a negative 
control. 
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Extended Data Figure 3 | See next page for caption. 
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Extended Data Figure 3 | CDK4/6 inhibitor, palbociclib, treatment 
elevated PD-L1 levels in vivo. a, b, Immunoblot (IB) analysis of whole 
cell lysates (WCL) derived from MC38 or B16-F10 mouse tumor cell line 
implanted tumors treated with palbociclib (150 mg/kg body weight, by 
gastric gavage) or vehicle for 7 days. n=5 mice per experimental group. 

c, FACS analysis for PD-L1 or CD3* T-cell populations from MC38 
implanted tumors treated with vehicle or palbociclib for 7 days. n=5 mice 
per experimental group. d, IB analysis of WCL derived from multiple 
organs in mice treated with palbociclib (150 mg/kg body weight, by gastric 
gavage) or vehicle for 7 days. n= 5 mice per experimental group. 

e, Quantification of PD-L1 protein bands intensity in Extended Data 
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Fig. 3d by using the ImageJ software. n= 5 mice per experimental group. 
f, IB analysis of WCL derived from 15 different tissues with/without 
palbociclib treatment and MMTV-c-Myc induced breast tumors. 

g, Quantification of PD-L1 protein bands intensity in Extended Data 
Fig. 3f by using the ImageJ software. n = 3 biological replicates h, In vitro 
kinase assay for Rb through using immunoprecipitated CDK4/cyclin 

D kinase complex from liver or brain by anti-CDK4 antibody IP. Note 
that cyclin D-CDK4 complex in non-dividing organs (livers and brains) 
displayed kinase activity, which might explain why CDK4/6 inhibitor 
elevated PD-L1 in these organs. Error bars, +s.d., two-tailed t-test, 
*P<0.05, **P < 0.01, ***P < 0.001. 
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Extended Data Figure 4 | Cullin 35?°? promotes PD-L1 ubiquitination 
and subsequent degradation largely through interaction with the 
cytoplasmic tail of PD-L1. a, A schematic illustration of PD-L1 with 
N-terminal signal peptide, extracellular domain, trans-membrane 
domain, cytoplasmic tail and the potential SPOP-binding motif in 

PD-L1. b, d, Immunoblot (IB) analysis of whole cell lysates (WCL) and 
GST pull-down precipitates derived from 293T cells transfected with 
indicated constructs and treated with MG132 (101M) for 12 hours before 
harvesting. c, IB analysis of WCL derived from PC3 stably expressing 
shCullin 3. e, g, IB analysis of WCL and immunoprecipitation (IP) derived 
from 293T cells transfected with indicated constructs and treated with 
MG132 (104M) for 12 hours before harvesting. f, IB of WCL and Ni-NTA 
pull-down products derived from the lysates of PC3 cells transfected 


with the indicated constructs. Cells were treated with MG132 (30,1M) for 
6 hours before harvesting and lysed in the denature buffer. h, IB analysis of 
WCL and IP derived from 293T cells transfected with indicated constructs 
and treated with MG132 (10,1M) for 12 hours before harvesting. i, IB 

of WCL derived from MDA-MB-231 PD-L1 KO cells stably expressing 
PD-L1 WT, delta 283-290, T290M as well as EV as a negative control. j, IB 
analysis of WCL derived from 293T cells transfected with HA-PD-L1 WT 
and the T290M mutant, which were treated with cycloheximide (CHX) for 
indicated time points before harvesting. k, IB of WCL and Ni-NTA pull- 
down products derived from the lysates of PC3 cells transfected with the 
indicated constructs. Cells were treated with MG132 (301M) for 6 hours 
before harvesting and lysed in the denaturing buffer. 1, IB of WCL derived 
from 293T cells transfected with indicated constructs. 
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Extended Data Figure 5 | See next page for caption. 
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Extended Data Figure 5 | SPOP negatively regulates PD-L1 protein 
stability in a poly-ubiquitination dependent manner. a-c, Immunoblot 
(IB) analysis of whole cell lysates (WCL) derived from 293T cells 
transfected with indicated constructs. d, e, IB analysis of WCL derived 
from 293T cells transfected with indicated constructs. 36 h post 
transfection, cells were treated with 20 j1g/ml cycloheximide (CHX) at 
indicated time points (d). The PD-L1 protein abundance were quantified 
by the ImageJ software and plotted (e). f, IB of WCL and Ni-NTA 
pull-down products derived from the lysates of PC3 cells transfected 
with the indicated constructs. Cells were treated with MG132 (301M) 
for 6 hours before harvesting and lysed in the denaturing buffer. 

g, A schematic illustration of SPOP with MATH and BTB domain to 
interact with substrate and Cullin 3, respectively. h, IB analysis of WCL 
and IP derived from 293T cells transfected with indicated constructs and 
treated with MG132 (101M) for 12 hours before harvesting. i IB analysis 


of WCL derived from 293T cells transfected with indicated constructs. 

j, RT-PCR analysis of relative mRNA levels of PD-L1 from Spopt’* and 
Spop-/~ MEFs. Data were represented as mean + s.d, n=5. k, IB analysis 
of WCL derived from PC3 cells infected with indicated lentiviral shRNAs 
against SPOP and selected with puromycin (1 j1g/ml) for 72 hours before 
harvesting. I-m, IB analysis of WCL derived from C42 cells with depletion 
of SPOP using sgRNA and treated with cycloheximide (CHX) for indicated 
time points before harvesting (1). The PD-L1 protein abundance were 
quantified by the Image] software and plotted (m). n, 0, IB analysis of 
WCL derived from LNCaP cells stably expressing shAR or shERG as 

well as shScr as a negative control. p, q, IB analysis of WCL derived from 
DU145 cells stably expressing shTrim24 or shDEK as well as shScr as a 
negative control. r-u, IB analysis of WCL derived from C42 SPOP WT and 
SPOP~’~ cells that stably expressed shAR, shERG, shTrim24, or shDEK as 
well as shScr, respectively. 


© 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. 


LETTER 


Data from TCGA b 


Data from TCGA: 234 mutations in 212 samples 
[rissa [# tote | Froaonoy | : 


a 133 
R15 131 
10 oy] MATH Domain 7) ‘+17 oo MATH Domain 
[Mj st8 Domain = 40 [i ete Domain 
Hl othe Positions 5 102 ia Other Positions 
5 — 
| 
F 5 | 87 
0 SSg88ee2e2—-—-----___ 3 |, Mm lan LI |, q PLUM Tm wit oe 


300 


ty 

0 
450 ] 

Oy 

% 


9 POF 
Ke $3 Les SS 


OQ = _- Mutation fequency in SPOP (%) 9) 
%, 
A 
Se 
% 
Sy 
4, 
ye 
& 


0) 


— - B16-F10 
oo 


& 
o g = 
kK 8 2@ + pLenti-HA 
ES ts D4 
os se a IB: PD-L1 
- + - + -4 - + Flag-SPOP £ 
++ ++ ++ ++ pcoDNA3-PD-L1 s IB: TRIM24 
o 
Q 
= IB: Vinculin 
c 
2 
g 
S 
= 


IB: Vinculin ia r & ic) © & 
aca FELRESLIESPESEEFSEPP Dg 
° 
= 420 B16-F1 
e Q 500 4 B16-F10 9 id 
Wild Type pau EV ra 100 
S407 ™ £ =G2/M 
1) mutated Or Truncated = <l=SPOP-WT S 80 / 
~ 300 4} xt=SPOP-F102C 23 60 ms 
PD-L1 (CD274) C-tail 4 eS 
= 200 2 ag = G0/G1 
o 
SPOP MATH domain = 00 = 20 
— [e} 
oO 
is 100 150 9 oO 8 ° 
& All es Types, N=10188 0 2 4 6 & SPOP EV WT F102C 
3 j _ Days post seeding ira 
£1207 2eRVvI1 S 2 500) _*** k I 
= 100 3 400 . 3 
2 29 B16-F10 tumors from C57BL/6 mice 3 a 
2 80 © E 300 a SPOP . 9 
= 60 acai £ = 200 : ee & 8 
3 7 EG & a G@ WT as 
5 40 : Eg 100 aad — E 
=™GO/G1 = 52 
5 20 /' a S F102C 3 oS 
3 0 a= WT F102C SPOP ' a” WT F102C SPOP 
i EV WT F102C SPOP 4T1 tumors from BALB/c mice 4T1 tumors from BALB/c mice 
m 3 = 2 1000 2 500 as 
3 © 2 00 82 sool - 
© gc + 2 
B16-F10 tumors from C57BL/6 mice - € Si 600 2 2 300 
g e é SPOP Z 5 5 2 400 © 2 200 as 
3 eg a Sw |é 3 = 2 200 22 so0lthe * 
S Jo roe) 
at 3 gz ° Ee 
WM 3 WT F102C SPOP WEFMRG SPP ‘= WT F102c SPOP 
i a — ——— — 
Anti-PD-L1 Anti-PD-L1 Anti-PD-L1 


q r 


B16-F10 tumors from Tero” mice 


= 
[=z] 
o 


SPOP 


@2@e6@ we B® 
& & @ & @ PM @ rx 


A a A 


= 
LJ 
o 


p 
o 


oat 
z= 
af 
o 
<2 
££ 
a 80 
59 
oO 
Eo 
= 2 
a3 
az 
az 


WT F102C SPOP 


Excised tumor mass (g) 


WT F102C SPOP 


Extended Data Figure 6 | See next page for caption. 
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Extended Data Figure 6 | Cancer-derived SPOP mutations fail to 
promote PD-L1 degradation. a, The mutation frequency (mutated cases/ 
total cases) of SPOP across 24 cancer types from the TCGA database. 
Mutations are categorized as happening in the MATH domain, in the BTB 
domain or at any other position of the gene, including UTRs. Because 
some patient cases contain mutations of two or three categories, the 
proportion of three colors are allocated mutation-wise, instead of case- 
wise. b, The distribution of mutation positions of SPOP in 24 cancer types 
from the TCGA database. Mutations with low translational consequences 
have been discarded. c, Immunoblot (IB) analysis of whole cell lysates 
(WCL) derived from 293T cells transfected with indicated constructs. 

d, The mutation frequency (mutated cases/total cases) of PD-L1 (CD274) 
across 19 cancer types from the TCGA database. e, Oncoplot of PD-L1 
(CD274) and SPOP across all 39 cancer types in the TCGA database. 
Only mutations or truncations in the C terminal tail of PD-L1 or in the 
MATH domain of SPOP are counted. f, IB of WCL derived from B16-F10 
mouse tumor cell line stably expressing the indicated SPOP constructs. 

g, h, Growth curve and cell cycle profile of B16-F10 cells stably expressing 
SPOP WT and the F102C mutant as well as EV as a negative control. 

i, Cell cycle profile of 22Rv1 cells stably expressing SPOP WT and the 
F102C mutant as well as EV as a negative control. j, Relative cell surface 
PD-L1 expression of 4T1 implanted tumors ectopically expressing 
SPOP-WT or the SPOP-F102C mutant were subjected to FACS analysis. 
n=5 mice per experimental group. k, B16-F10 cells stably expressing 
SPOP-WT or the SPOP-F102C mutant implanted tumors from C57BL/6 
mice were dissected and taken a picture after euthanizing the mice. l, The 
number of CD3* T-cell populations from the isolated tumor-infiltrating 


lymphocytes in 4T1 cells stably expressing SPOP-WT or the SPOP-F102C 
mutants implanted tumors were subjected to FACS analysis. n =5 mice per 
experimental group. m, B16-F10 cells stably expressing SPOP-WT or the 
SPOP-F102C mutant implanted tumors from C57BL/6 mice treated with 
anti-PD-L1 antibody were dissected and taken a picture after euthanizing 
the mice. n=7 mice per experimental group. n, The weight of B16-F10 
cells implanted tumors from C57BL/6 mice treated with anti-PD-L1 
antibody. 12 mice per experimental group. 0, Relative cell surface PD-L1 
expression of B16-F10 cells implanted tumors ectopically expressing 
SPOP-WT or the SPOP-F102C mutant treated with anti-PD-L1 antibody 
were subjected to FACS analysis. n = 5 mice per experimental group. 

p> The number of CD3* T-cell populations from the isolated tumor- 
infiltrating lymphocytes in B16-F10 cells implanted tumors ectopically 
expressing SPOP-WT or the SPOP-F102C mutant treated with control 
IgG or anti-PD-L1 antibody were subjected to FACS analysis. n =7 mice 
per experimental group. q, B16-F10 cells stably expressing SPOP-WT 

or the SPOP-F102C mutant implanted tumors from Tcra”/~ mice were 
dissected and taken a picture after euthanizing the mice. n=7 mice per 
experimental group. r, Relative cell surface PD-L1 expression of B16-F10 
cells stably ectopically expressing SPOP-WT or the SPOP-F102C mutant 
implanted tumors from Tera~/~ mice were subjected to FACS analysis. 
n=7 mice per experimental group. s, The number of CD3* T-cell 
populations from the isolated tumor-infiltrating lymphocytes in B16-F10 
cells stably ectopically expressing SPOP-WT or the SPOP-F102C mutant 
implanted tumors from Tera~/~ mice were subjected to FACS analysis. 
n=7 mice per experimental group. Error bars, + s.d., two-tailed t-test, 
*P< 0,05, ** P< 0,01, *** P< 0.001, NS: no significance. 
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Extended Data Figure 7 | Validation of anti-PD-L1 and anti-CD8 
antibodies through using PD-L1 KO or shCD8 cells. a, Immunoblot 
(IB) analysis of whole cell lysates (WCL) derived from MDA-MB-231 cells 
depleted PD-L1 through the CRISPR-Cas9 system. b, Immunofluorescence 
(IF) for MDA-MB-231 PD-L1 WT and KO cells using the anti-PD-L1 
antibody. The scale bar represents 50 1m. c, d, Immunochemistry (IHC) 
for MDA-MB-231 PD-L1 WT and KO cells from cultured on glass slides 
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(c) or implanted tumors (d) using the anti-PD-L1 antibody. The scale 
bar represents 501m. e, f, IB analysis of WCL derived from HBP-ALL (e) 
or KE37 (f) cells stably expressing shCD8 as well as shScr as a negative 
control using the anti-CD8 antibody. g, h, IHC for HBP-ALL (g) or KE37 
(h) cell pellets stably expressing shCD8 as well as shScr as a negative 
control using the anti-CD8 antibody. The scale bar represents 501m. 
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Extended Data Figure 8 | Depletion of Cdh1, but not Cdc20, prolongs 
SPOP proteins stability, which is simultaneously coupled with a 
decrease in PD-L1 protein level. a-c, Immunoblot (IB) analysis of whole 
cell lysates (WCL) derived from HeLa depleted SPOP through the CRISPR- 
Cas9 system (a) or depleted Cdc20 or Cdh1 through multiple independent 
shRNAs (b, c). d, IB analysis of WCL and immunoprecipitation (IP) 
derived from 293T cells transfected with indicated constructs and treated 
with MG132 (10|1M) for 12 hours before harvesting. e, IB analysis of WCL 
and IP derived from HeLa cells treated with MG132 (10\1M) for 12 hours 


before harvesting. f, A sequence comparison of D-box motif (RxxLxxxxN) 
in SPOP derived from different species. g, IB analysis of WCL derived 
from HeLa cells transfected with indicated constructs. h, i, IB analysis 

of WCL derived from 293T cells transfected with indicated constructs. 
36h post transfection, cells were treated with cycloheximide (CHX) as 
indicated time points before harvesting (h). The protein abundance of 
SPOP-WT and deletion of RxxL mutant were quantified by the ImageJ 
software (i). 
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Extended Data Figure 9 | See next page for caption. 
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Extended Data Figure 9 | Cyclin D/CDK4-mediated phosphorylation of 
SPOP at the Ser6 residue promotes its binding with 14-3-3-j to reduce 
its poly-ubiquitination and subsequent degradation by APC/Cdh1. 

a, A sequence comparison of conserved SP sites and putative 14-3-3+ 
binding motif in SPOP. b, Immunoblot (IB) analysis of whole cell lysates 
(WCL) and immunoprecipitation (IP) derived from 293T cells transfected 
with indicated constructs and treated with MG132 (101M) for 12 hours 
before harvesting. c, d, In vitro kinase assays with recombinant Rb and 
SPOP as substrates and cyclin D1/CDK4, cyclin D2/CDK4 and cyclin D3/ 
CDK4 as kinase complex were performed. BSA was used as a negative 
control where indicated. e, IB analysis of WCL and immunoprecipitation 
(IP) derived from MDA-MB-231 cells transfected with indicated 
constructs, which were treated with/without palbociclib (11M) for 
12hours. f, Streptavidin beads pull-down assay for biotin-labeled SPOP 
peptide with/without phosphorylation at the Ser6 residue to examine its 
in vitro association with 14-3-3-+. g, IB analysis of WCL and GST pull- 
down precipitates derived from 293T cells transfected with indicated 
constructs and treated with MG132 (101M) for 12 hours before 


harvesting. h, i, IB analysis of WCL and IP derived from 293T cells 
transfected with indicated constructs and treated with MG132 (101.M) 
for 12 hours before harvesting. j, k, IB analysis of WCL derived from 293T 
cells transfected with indicated constructs. 36h post transfection, cells 
were treated with 20 j1g/ml cycloheximide (CHX) as indicated time points 
(j). The protein abundance of SPOP-WT and S6A mutant were quantified 
by the Image] software and plotted accordingly (k). 1, p, IB of WCL 

and Ni-NTA pull-down products derived from the lysates of PC3 cells 
transfected with the indicated constructs. Cells were treated with MG132 
(30\1M) for 6 hours before harvesting and lysed in the denaturing buffer 
for following assay. m-o, IB analysis of WCL and IP derived from 

293T cells transfected with indicated constructs and treated with MG132 
(101M) and with/without palbociclib (11M) for 12 hours before 
harvesting. q-s, IB of WCLs derived from PC3, BT549 and HeLa cells 
stably expressing sh 14-3-37 as well as shScr as a negative control. 

t, IB of WCL derived from HeLa cells stably expressing shScr or 
sh14-3-3y synchronized in M phase by nocodazole treatment prior to 
releasing back into the cell cycle for the indicated times. 
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Extended Data Figure 10 | See next page for caption. 
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Extended Data Figure 10 | Combination therapy of anti-PD-1 mAb and 
CDK4/6 inhibitor in MC38 colon cancer mouse model. a, A schematic 
model that illustrates the treatment plan for mice bearing subcutaneous 
MC38 tumors. Female C57BL/6 mice were implanted with 0.1 x 10° MC38 
cells subcutaneously and treated with four arms: control antibody 
treatment, anti-PD-1 mAb treatment, CDK4/6 inhibitor treatment, 
anti-PD-1 mAb plus CDK4/6 inhibitor combination treatment. b, MC38 
implanted tumor-bearing mice were enrolled in different treatment 
groups as indicated. Tumor volumes of mice treated with control antibody 
(n= 15), anti-PD-1 mAb (n = 15), the CDK4/6 inhibitor, palbociclib 

(n= 14) or combined therapy (n = 12) were measured every three days 
and plotted individually. We repeated this experiment twice. c, Kaplan- 
Meier survival curves for each treatment group demonstrate the improved 
efficacy of combining PD-1 mAb with the CDK4/6 inhibitor, palbociclib. 
*P < 0.05. (Gehan-Breslow-Wilcoxo test). We repeated this experiment 
twice. d, e, g, i, The absolute number of CD3*, CD4*, CD8*, Granzyme 
Bt, or IFNy* TILs cells of implanted MC38 tumors treated with indicated 
agents was analyzed by FACS. Control: n= 8, palbociclib: n= 10, PD-1 Ab: 
n=9, Palbociclib & PD-1 Ab: n=8. f, h, j, The percentage of CD4*, CD8t 


in CD3* TILs cells of implanted MC38 tumors treated with indicated 
agents was analyzed by FACS. Control: n= 8, palbociclib: n= 10, PD-1 
Ab: n= 9, Palbociclib & PD-1 Ab: n= 8. k, A proposed working model 

to illustrate how PD-L1 protein stability is regulated by the cyclin D/ 
CDK4-SPOP-Cdh1 signaling pathway. The cyclin D/CDK4 negatively 
regulates PD-L1 protein stability largely through phosphorylating its 
upstream physiological E3 ligase SPOP to promote SPOP binding with 
14-3-34, which subsequently disrupts Cdh1-mediated destruction of 
SPOP. As such, CDK4/6 inhibitor treatment could unexpectedly elevate 
PD-LI protein levels largely through inhibiting cyclin D/CDK4-mediated 
phosphorylation of SPOP to promote its degradation by APC/CS*!, 

The unexpected rise of PD-L1 could present a severe clinical problem 

for patients receiving CDK4 inhibitor treatment and could be one of the 
underlying mechanisms accounting for CDK4 inhibitor resistance via 
evading immune surveillance checkpoint. Hence, our work provides a 
novel molecular mechanism as well as the rationale for the combinational 
treatment of PD-L1 blockage treatment and the CDK4/6 inhibitors as a 
more efficient anti-cancer clinical option. Error bars, + s.d., two-tailed 
t-test, *P< 0.05, **P< 0.01, ***P < 0.001, NS: no significance. 
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Cellular senescence is a stress-responsive cell-cycle arrest program 
that terminates the further expansion of (pre-)malignant cells!. 
Key signalling components of the senescence machinery, such 
as p16!NK42, 521C!?! and p53, as well as trimethylation of lysine 9 
at histone H3 (H3K9me3), also operate as critical regulators of 
stem-cell functions (which are collectively termed ‘stemness’)*. In 
cancer cells, a gain of stemness may have profound implications for 
tumour aggressiveness and clinical outcome. Here we investigated 
whether chemotherapy-induced senescence could change stem- 
cell-related properties of malignant cells. Gene expression and 
functional analyses comparing senescent and non-senescent B-cell 
lymphomas from Ey-Myc transgenic mice revealed substantial 
upregulation of an adult tissue stem-cell signature, activated Wnt 
signalling, and distinct stem-cell markers in senescence. Using 
genetically switchable models of senescence targeting H3K9me3 or 
p53 to mimic spontaneous escape from the arrested condition, we 
found that cells released from senescence re-entered the cell cycle 
with strongly enhanced and Wnt-dependent clonogenic growth 
potential compared to virtually identical populations that had been 
equally exposed to chemotherapy but had never been senescent. 
In vivo, these previously senescent cells presented with a much 
higher tumour initiation potential. Notably, the temporary 
enforcement of senescence in p53-regulatable models of acute 
lymphoblastic leukaemia and acute myeloid leukaemia was found 
to reprogram non-stem bulk leukaemia cells into self-renewing, 
leukaemia-initiating stem cells. Our data, which are further 
supported by consistent results in human cancer cell lines and 
primary samples of human haematological malignancies, reveal 
that senescence-associated stemness is an unexpected, cell- 
autonomous feature that exerts its detrimental, highly aggressive 
growth potential upon escape from cell-cycle blockade, and 
is enriched in relapse tumours. These findings have profound 
implications for cancer therapy, and provide new mechanistic 
insights into the plasticity of cancer cells. 

Cellular senescence, which is implemented in response to severe 
cellular insults such as oncogenic activation or chemotherapeutic 
DNA damage, is a failsafe program that protects organismic integrity 
by excluding potentially harmful cells from further expansion*", and 
also has a physiological function in tissue homeostasis during organ 
development!. Senescence has been shown to cancel the pro-tumorigenic 
potential of Ras-/Raf-driven (pre-)cancerous lesions*~’, and to 
contribute to the outcome of anticancer chemotherapy in vivo*’. 


Notably, stem-cell functions, collectively referred to as ‘stemness’, 


and senescence seem to be co-regulated by overlapping signalling 
networks. Key senescence-relevant signalling molecules (for example, 
Bmi-1, p16, p21?! or p53) have critical roles in stem-cell main- 
tenance by preventing premature exhaustion (reviewed in ref. 3). 
Senescence-enforcing p53 (also known as Trp53)-, Cdkn2a (also known 
as Ink4a or Arf)- or Suv39h1-encoded gene products raise an initial 
barrier to the efficient conversion of normal cells into induced pluri- 
potent stem cells (see refs 10, 11, and references therein), suggesting an 
underexplored interplay between senescence- and stemness-controlling 
signalling networks. Trimethylation of H3K9, as mediated by the H3K9 
methyltransferase Suv39h1 (ref. 12), confers senescence by establishing 
a transcriptionally repressive heterochromatin mark in the vicinity of 
S-phase-relevant E2F target genes®”', and reflects an epigenetic prin- 
ciple linked to induced pluripotent stem cell reprogramming". Using 
a cancer-unrelated, inducible reprogramming mouse model in which 
many cells primarily senesced, previous studies have shown that fac- 
tors secreted from these senescent cells facilitated the reprogramming 
of their neighbours!*'°. Whether the senescence condition promotes 
cancer stemness, especially in a cell-autonomous manner, is not known. 
Although a permanent senescent cell-cycle block is per se incompatible 
with self-renewal, we report here the senescence-evoked cell-intrinsic 
reprogramming of cancer cells into a stem-like state, and the acquisition 
of tumour-initiating potential after their forced release or spontaneous 
escape from a chemotherapy-induced senescent cell-cycle arrest. 

As indicated by their strong senescence-associated 3-galactosidase 
(SA-(-gal) activity and other previously demonstrated markers of 
senescence, primary Ej.-Myc transgenic Bcl2-overexpressing lym- 
phomas (hereafter referred to as control;Bcl2 lymphomas) serve as a 
well-established model for therapy-induced senescence (TIS)*”. First, 
we analysed stem-cell-related transcripts in the gene expression profiles 
of 12 matched pairs of primary control;Bcl2 lymphomas that either 
entered TIS after in vitro exposure to the chemotherapeutic agent 
Adriamycin (ADR) or remained untreated. Using gene set enrich- 
ment analysis (GSEA), a previously established adult tissue stem-cell 
(ATSC) signature’” was strongly skewed towards the TIS group, but was 
not found to be enriched in the equally ADR-treated but senescence- 
incapable group of Suv39h1-deficient Eu-Myc;Bcl2 (that is, 
Suv39h1~;Bcl2) lymphomas? (Fig. 1a and Extended Data Fig. 1a, b). 
Almost the entire population turned double-positive for the stem-cell 
antigen Scal and the senescence marker H3K9me3 upon senescence 
induction (Fig. 1b, top). Furthermore, TIS cells, unlike non-senescent 
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Figure 1 | Therapy-induced senescent cancer cells acquire phenotypic 
and functional stemness features. a, GSEA of an adult tissue stem cell 
profile’” (ATSC; top) in matched pairs of ADR-exposed versus untreated 
control;Bcl2 lymphomas (n = 12; left) and Suv39h1~;Bcl2 lymphomas 
(n=5; right). TIS lymphomas display more than 80% SA-6-gal-positive 
blue cells? (representative photomicrographs from four independent 
experiments). b, Co-expression of the stem cell marker Scal and the 

TIS marker H3K9me3 (top) in lymphoma cells as in a, and aldehyde 
dehydrogenase (ALDH) activity with and without the ALDH inhibitor 
diethylaminobenzaldehyde (bottom) by flow cytometry. Mean percentage 
of positive cells + s.d.; n=5 biologically independent samples each. 

c, Expression of the indicated stem-cell-related genes in various human 
cancer cell lines or primary B-CLL samples by quantitative PCR (qPCR), 
related to their ability to enter TIS (ADR-senescent, blue; non-senescent 
despite ADR exposure, white (see Extended Data Fig. 1c for details)). 
Colours reflect fold induction (between ADR-treated and untreated 
samples) from one representative out of three independent experiments 
(cell lines) or four individual samples from patients with B-CLL. 
Transcripts below the detection level are shown in light grey. d, GSEA of 
the adult tissue stem-cell profile in the publicly available transcriptome 
of BRAFY°=-infected melanocytes, which senesce in response to Braf 
activation” (left; seven matched pairs), and colon adenomas, which 
are known to contain a large proportion of senescent cells”® (right; five 
ApcM'+ mouse adenoma biopsies and six healthy colon tissue samples). 


cells, presented with increased aldehyde dehydrogenase (ALDH) and 
ATP-binding cassette (ABC) transporter activities (Fig. 1b, bottom, 
and Extended Data Fig. 1d), both typical properties of stem cells. When 
assessing human malignancies of various origins, we found a notable 
upregulation of stem-cell-related transcripts selectively in TIS-capable 
cell lines as well as samples from patients with primary B-cell chronic 
leukaemia (B-CLL) (Fig. 1c and Extended Data Fig. Ic, e, f). Moreover, 
the acquisition of stemness-related properties can also be found in the 
process of oncogene-induced and replicative senescence in cells of 
various tissue types, including melanocytes, colon mucosa and breast 
epithelial cells (Fig. 1d and Extended Data Fig. 1g). Hence, cancer cells 
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Figure 2 | Senescence-released (previously senescent) lymphomas 
display higher tumour-initiating capacity than their never senescent 
counterparts. a, Growth properties of conditionally senescent 
Suv39h1—;Bcl2;Suv39h1-ER" lymphoma cells after five days of 
ADR=+4-OHT treatment (treatment), and subsequent passages in 
4-OHT/ADR-free medium (post-treatment, p1-p2; each passage reflecting 
seven days in culture) presented as proliferation (left, mean BrdU/PI- 
marked S-phase fraction + s.d., n = 5 biologically independent samples; 
BrdU, 5-bromo-2’-deoxyuridine; PI, propidium iodide), SA-3-gal staining 
(middle, mean positive cells + s.d., n = 5 biologically independent 
samples), and colony formation (right, quantified in b). Flow microscopy 
images (bottom) of the fluorescent SA--gal mark together with the 
proliferation marker EdU (passage 1 shown, see Extended Data Fig. 2g 

for details) demonstrates the outgrowth of senescent (SA-B-gal*) cells. 
Representative photomicrographs from four independent experiments. 

b, Colony counts of lymphoma cells (treated as in a) in extended serial 
passaging (pl-p14). Graphs show mean colony numbers + s.d., n =3 
individual lymphomas. Two-tailed unpaired t-test with Welch's correction, 
comparing ADR- and 4-OHT+ADR pretreated cells at p14. *P < 0.05. 

c, Tumour initiation after transplantation of different numbers of 
Suv39h1~;Bcl2;Suv39h1-ER™ lymphoma cells pre-exposed to the indicated 
treatments in vitro. Bars reflect numbers of lymphoma-bearing mice out 
of 10 animals per group transplanted, within an observation period of 

up to 100 days. P< 0.001 for comparing never senescent and previously 
senescent groups (y”). 


of mouse and human origin acquire novel stem-cell features upon 
entering cellular senescence. 

To test whether senescence-associated stemness (SAS) translates 
into different tumour behaviour upon release from the division block, 
we generated switchable model systems (using 4-hydroxytamoxifen 
(4-OHT)-inducible essential senescence mediators Suv39h1 or p53) 
that can enter full-featured senescence with increased levels of stem- 
cell-related transcripts and proteins only when exposed to both 4-OHT 
and ADR (Fig. 2a and Extended Data Fig. 2a-c). After changing to 
ADR- and 4-OHT-free medium to switch Suv39h1 or p53 off again, 
single-cell analyses revealed that senescent cells resumed sustainable 
proliferation within a few days; that is, they became first double- 
positive for the retained fluorescence-based senescence marker (a vital 
stain) and 5-ethynyl-2’-deoxyuridine (EdU) incorporation, indicating 
restarted DNA synthesis (with the proliferation-repressive H3K9me3 
mark gradually vanishing), before SA-3-gal activity was eventually 
lost and S-phase activity fully regained (Fig. 2a and Extended Data 
Fig. 2d-g). Therefore senescence is, in principle, a reversible condition, 
which becomes evident when essential senescence maintenance genes 
are no longer expressed. Importantly, serial replatings in colony- 
formation experiments of such previously senescent cells led to signi- 
ficantly more colonies compared to the aliquot of never senescent cells 
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Figure 3 | Canonical Wnt signalling, activated in TIS, is an essential 
driver of the enhanced tumour initiation capacity exhibited by 
senescence-released tumour cells. a, Co-expression of the fluorescent 
SA-B-gal marker and 8-catenin in ADR-exposed control;Bcl2 or TIS- 
incapable Suv39h1-;Bcl2 lymphoma cells (left), and corresponding 
8-catenin transcriptional activities measured as relative TOPflash T-cell 
factor (TCF) reporter signals with FOPflash as a TCF-binding site 

mutant control (right). Mean percentage of double-positive cells or mean 
relative light units fold change (between ADR-treated and untreated 
samples) + s.d., respectively (n = 4 biologically independent samples each). 
The inset shows a representative photomicrograph from four independent 
experiments. b, Colour-coded heat map reflecting fold change 

(between previously senescent and never senescent cells) of permissive 
H3K4me3 and repressive H3K27me3 histone marks at the promoters 

of indicated ATSC- or Wnt-related (asterisks) genes by chromatin 
immunoprecipitation (n = 3 biologically independent samples). c, Colony 
formation of never senescent (NS) versus previously senescent (PS) 
Suv39h1~;Bcl2;Suv39h1-ER" lymphomas (passage 2, compare with Fig. 2), 
exposed to the pharmacological Wnt inhibitors (ICG-001, salinomycin) or 


of the same lymphoma treated with the same dose of chemotherapy, 
reflecting the now unleashed stemness properties acquired as a latent 
program during senescence (Fig. 2a, b). The enhanced colony-founding 
potential of previously senescent cells was stable over an extended 
observation period of up to 100 days (reflecting 14 serial replatings; 
Fig. 2b). Similar results were obtained with p53-ER™™ as another 
inducible senescence gatekeeper; with +\-irradiation as an alternative 
senescence trigger; with ADR-exposed human lymphoma cell lines; 
and with colon cancer cells representing a solid, epithelial cancer type 
(Extended Data Fig. 3a-f). It is noteworthy that previously senescent 
cells typically retained the ability to re-enter TIS when re-exposed 
to 4-OHT and ADR, indicating that no selection for senescence- 
compromising mutations occurred in previously senescent cells 
(Extended Data Fig. 3g). Previously, an instructive, non-cell-autonomous 
role has been attributed to the senescence-associated secretory 
phenotype (SASP; reviewed in ref. 2) in models of inducible reprogram- 
ming and tissue regeneration'>'®; however, our observations, made in 
pure, homotypic tumour cell populations, even under drastic reduction 
of SASP factor expression, favour a largely cell-intrinsic mechanism 
of senescence-associated reprogramming (Extended Data Fig. 4). 
Although we cannot completely exclude alternative explanations, 
these and the subsequent data strongly favour senescence-associated 
stemness as the most compelling and consistent interpretation of the 
observations presented. 

Enrichment assays between matched pairs of never senescent versus 
previously senescent lymphomas confirmed the higher growth 
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shRNA against Ctnnb1 (shCtnnb1) for 7 days. Results reflect mean 

colony numbers + s.d. (n = 3 biologically independent samples). 
Two-tailed unpaired t-test with Welch’s correction, *P < 0.05. 

d, Survival of mice transplanted with matched previously senescent or never 
senescent cells and treated with indicated Wnt inhibitors upon palpable 
lymphoma formation. Cells with shRNA against Ctnnb1 were shRNA- 
infected before transplantation. Boxes frame the 25th to 75th percentile 
range, with median, minimal and maximal values (n = 6 mice per treatment 
group). Two-tailed, paired t-test, *P < 0.05. e, Expression of Wnt target 
genes (by qPCR) in matched cases of control;Bcl2 lymphomas before and 
after relapse from senescence-inducing cyclophosphamide treatment in vivo 
(mean fold change + s.d., n= 4 biologically independent samples). f, Nuclear 
8-catenin expression by immunostaining of lymph nodes from control;Bcl2 
lymphoma-bearing mice as in e (left; n = 4 biologically independent 
samples), and human DLBCL biopsies from the same individual patients at 
diagnosis and at relapse after first-line induction chemotherapy (right; n =5 
independent patients). Mean percentage of positive cells + s.d.; two-tailed, 
paired t-test, *P < 0.05. Representative photomicrographs; scale bars, 

100 1m (magnifying inserts, 101m). 


competitiveness of previously senescent lymphomas both in vitro and 
in vivo (Extended Data Fig. 2h). Importantly, in vivo tumour initia- 
tion experiments found previously senescent lymphomas produced 
malignancies at much lower transplanted cell numbers in immune- 
competent recipient mice when compared to never senescent lymphomas 
(Fig. 2c). Taken together, the SAS program exerts its detrimental effect 
on tumour initiation upon release from TIS, thereby unmasking an 
unexpected tumour-promoting capability of the senescence program. 

To test which key stemness pathways drive SAS, we used GSEA 
in ADR-exposed control;Bcl2 versus Suv39h1~;Bcl2 lymphomas 
for numerous gene sets related to Notch, Hedgehog, and canoni- 
cal and non-canonical Wnt signalling. Canonical Wnt and, to some 
extent, Notch signalling, appeared to be significantly enriched in TIS 
(Extended Data Fig. 5a, b). Because Wnt signalling plays a central role 
in stem-cell renewal in many tissues including the haematopoietic com- 
partment, induces Notch signalling, and is required for cancer stem cell 
development in haematological malignancies'*", we considered activa- 
tion of the Wnt cascade as the putative driver behind the newly acquired 
stemness features in TIS lymphomas. Indeed, we detected enhanced, 
predominantly nuclear expression and transcriptional activation of 8 
-catenin in control;Bcl2 but not in Suv39h1~;Bcl2 lymphomas, as well 
as in TIS-capable human cancer cell lines after ADR treatment (Fig. 3a, 
Extended Data Fig. 2b and Extended Data Fig. 5c, d). Independent 
of Wnt ligand-receptor stimulation, we identified inhibition of the 
B-catenin degradation-promoting glycogen synthase kinase 36 (GSK38) 
via activated MEK-MAPK and PI3K-Akt signalling—which is 
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Figure 4 | Cellular senescence catalyses de novo reprogramming of non- 
stem bulk leukaemia cells into leukaemia-initiating cells. a, Stemness- 
related features in conditionally senescent mouse Kras@!??;DOX-shp53- 
GFP;Bcl2 bulk leukaemia cells (Lin” Kit*Scal*-depleted) treated for 

five days with ADR + doxycycline (DOX)). Senescence induction is 
demonstrated by SA-(-gal staining (top), expression of stem-cell markers 
Kit and Scal analysed by flow cytometry (middle), and relative expression 
of the indicated transcripts by qPCR (bottom). Numbers reflect mean 
percentages of positive cells (top, middle) or average fold induction 
(bottom) + s.d. (n= 3 biologically independent samples). b, Tumour 
initiation capacity of bulk leukaemia cells pretreated in vitro as in a, 


typically upregulated in senescence”’—as the cell-autonomous driver 
of the Wnt program (Extended Data Fig. 6). The implementation of 
the Wnt program was further promoted by epigenetically permissive 
remodelling at promoters of stem-cell- and Wnt signalling-related 
genes in previously senescent as compared to never senescent cells 
(Fig. 3b). Accordingly, we found that the increased colony-forming 
potential of previously senescent lymphoma or colon cancer cells 
was dependent on Wnt signalling, as genetic or pharmacological dis- 
ruption of the Wnt-(-catenin cascade—without preventing TIS or 
profoundly affecting cell viability—neutralized the higher clonogenicity 
of previously senescent cells (Fig. 3c and Extended Data Fig. 7a—d). 
In contrast to the never senescent cell population, a rarely dividing 
and strongly 8-catenin-expressing subpopulation was detectable in 
the previously senescent cells only, and maintained at a stable steady 
state, explaining the lastingly enhanced colony-forming potential of 
previously senescent compared to never senescent cells (Extended 
Data Fig. 8). Consistently, the biology of the previously senescent state 
translated into shortened survival when previously senescent and 
never senescent cells were propagated in mice, whereas exposure to 
Wnt inhibitors in vivo or stable lymphoma cell transduction with a 
construct expressing short hairpin RNA (shRNA) against Ctnnb1 
(which encodes 8-catenin) improved the poor long-term outcome 
of mice harbouring previously senescent lymphomas (Fig. 3d and 
Extended Data Fig. 7b, e, f). 

Importantly, cell cycle re-entry out of TIS—as a prerequisite to exert 
stem-cell potential—is not limited to conditional, switchable systems, 
but may, as a rare event, spontaneously occur in control;Bcl2 lymphomas, 
as demonstrated by the emergence of EdU-co-positive cells out of a 
solely SA-G-gal-positive senescent cell population (Extended Data 
Fig. 9). Given their stem-cell potential, we postulated that 3-catenin- 
positive previously senescent cells might be enriched in lymphomas 
that progressed after chemotherapy. Hence, when comparing primary 
control;Bcl2 lymphomas before therapy with the same individual 
lymphomas that had relapsed after exposure to senescence-inducing 
cyclophosphamide chemotherapy in vivo’, we found a much higher 
fraction of cells positive for nuclear B-catenin in relapse lymphomas 
that also presented with higher expression levels of Wnt target genes 
(Fig. 3e, f, left). Moreover, longitudinally matched biopsy pairs from the 
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cultivated in ADR-free/DOX-supplemented medium for an additional 
two passages and transplanted at indicated cell numbers. Lin™ cells were 
propagated without ADR. Numbers indicate leukaemia-bearing mice 

out of six animals per group transplanted, within an observation period 
of up to 100 days (n = 6 mice per treatment group). c, Flow cytometry 
plots showing peripheral blood phenotyping of mice transplanted as 

in b. The GFP* leukaemia cells are depicted in green. The insets show 
photomicrographs of peripheral blood smears stained with haematoxylin 
and eosin, showing leukaemic blasts (typically not detectable in never 
senescent recipients). One representative out of three independent 
experiments shown. 


same individual patients diagnosed with diffuse large B-cell lymphoma 
(DLBCL) before chemotherapy and at disease recurrence revealed 
significantly more nuclear 3-catenin-positive tumour cells in the pre- 
viously chemotherapy-exposed, re-emerging samples (Fig. 3f, right), 
further supporting a link between activated Wnt signalling in relapsed 
tumours and senescence-related tumour cell reprogramming. Taken 
together, TIS-associated stemness reflects a Wnt-governed capability 
that is stably maintained in a reprogrammed, hierarchically organized 
subpopulation of post-senescent tumour cells and critically associated 
with tumour progression and treatment failure. 

As presumably applying to various human tumours including aggres- 
sive lymphomas, Ej.-Myc transgenic mouse lymphomas do not origi- 
nate from a distinct fraction of cancer stem cells, because almost all 
lymphoma cells possess tumour-initiating potential in this model’. 
Consequently, next we asked whether cellular senescence might 
account for the reprogramming of non-stem tumour cells into cancer 
stem cells”’, in tumour types in which the tumour-initiating capacity 
is confined to a rare subpopulation. We isolated a non-self-renewing 
population of leukaemia cells from a mouse model of T-cell acute 
lymphoblastic leukaemia (T-ALL) driven by oncogenic Kras®!”? and 
conditional inactivation of p53 via a doxycycline-controlled shRNA 
(shp53)*? (Extended Data Fig. 10a). ADR exposure induced senescence 
in the majority of non-stem leukaemia cells only if p53 expression was 
not cancelled (Fig. 4a, top). This group exhibited a significant con- 
version to Kit*Scal* cells, indicative of putative leukaemia stem cells 
(P=0.02, compared to ADR-exposed but p53-deficient cells; Fig. 4a, 
middle), and higher expression of stem-cell-related transcripts (Fig. 4a, 
bottom). Upon release from TIS by knockdown of p53, these leukaemia 
cells resumed proliferation (thereby becoming previously senescent 
cells), and formed significantly more colonies as compared to their 
equally ADR-treated never senescent leukaemia counterparts that 
remained p53-inactive throughout the experiment (Extended Data 
Fig. 10b). As reported for TIS lymphomas, cells with nuclear B-catenin 
expression were almost exclusively detectable in the senescent 
leukaemia cell population, and Wnt inhibitors completely neutralized 
the increased colony formation potential of their previously senes- 
cent progeny (Extended Data Fig. 10c, d). Most importantly, almost 
all samples of previously senescent cells—and nearly none of the 
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samples of never senescent cells—initiated leukaemias in recipient 
mice (P=0.0275, comparing previously senescent and never senes- 
cent groups); as expected, all Lin~ transplants gave rise to leukaemias 
(P < 0.001, comparing Lin™ and never senescent groups; Fig. 4b, c). 
Notably, and further adding to SAS in oncogene-induced senescent 
colon mucosa cells or melanocytes (compare with Fig. 1d), TIS repro- 
gramming is not restricted to cells of lymphoid origin, as demonstrated 
for an acute myeloid leukaemia (AML) mouse model”, culture- 
established human AML cells, and primary human leukaemic blast 
samples obtained at diagnosis from patients with AML (Extended Data 
Fig. 10e-l). Thus, cellular senescence is not only associated with addi- 
tional stem-cell features in tumour cells with pre-existing self-renewal 
capability, but also catalyses the cell-autonomous reprogramming of 
non-stem bulk tumour cells of lymphoid and non-lymphoid origin 
into de novo cancer stem cells. 

We present here an unexpected cell-intrinsic link between the 
senescence program and the acquisition of self-renewing properties, 
which we postulate serves as a physiological rescue mechanism in 
development and tissue homeostasis. We and others have observed 
that senescence not only occurs in critically stressed cells, but also may 
spread to adjacent cells via SASP components in a paracrine fashion 
(ref. 25; J.R.D. and C.A.S., unpublished observations). We propose that 
nature equipped normal cells with a latent SAS capacity (compare with 
Extended Data Fig. 1g) to counter the imminent loss of an entire tissue 
compartment due to pro-apoptotic and pro-senescent stresses: in rare 
cells spontaneously re-entering the cell cycle when threatening stresses 
no longer apply, SAS may become a tissue-replenishing principle. In 
a neoplastic context, cellular senescence—particularly in tumour cells 
with apoptotic defects—appears to be primarily a beneficial response 
by keeping tumour growth in check. However, post-senescent cells with 
‘hijacked’ SAS exert their detrimental potential at relapse by driving a 
much more aggressive growth phenotype. Therefore, pharmacological 
strategies to specifically eliminate senescent cells before a fraction of 
them may implement their acquired stemness capacity become, as 
previously reported by us regarding cancer? and by others regarding 
ageing-related pathologies”*””, a critical therapeutic need. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 


Generation of primary mouse lymphomas and leukaemias, and use of primary 
human B-cell lymphoma, B-CLL and AML samples. All animal protocols used in 
this study were approved by the governmental review board (Landesamt Berlin), 
and conform to the respective regulatory standards. Lymphomas with defined 
genetic defects were generated by intercrossing Ey-Myc transgenic mice to mice 
carrying loss-of-function alleles at the Suv39h1 locus*®*! or to mice harbouring 
a 4-OHT-inducible p53-ER™™ knock-in allele, encoding a p53—oestrogen recep- 
tor fusion protein that is inactive in the absence of 4-OHT™, all in a C57BL/6 
background. Ej.-Myc transgenic lymphomas that formed in E,i-Myc;p53-ER™"!*+ 
mice with an allelic loss of the remaining p53 wild-type allele were designated 
p53-ER™" lymphomas. Suv39h1~ lymphomas reflect Ej.-Myc lymphomas that 
arose in Suv39h1~’~ females or, owing to the X-linkage of the Suv39h1 locus, in 
Suv39h1*~ males**. Genotyping of the offspring by allele-specific genomic PCR, 
monitoring of lymphoma onset and isolation of viable lymphoma cells were carried 
out as described®*+, Kras@!??;shp53-GFP-induced T-cell acute lymphoblastic leu- 
kaemias (T-ALL) with tetracycline (that is, doxycycline)-dependent shp53 expres- 
sion ((DOX-or) were generated and isolated following a previously published 
protocol with minor modifications”>*°. The Nras@!??/MLL-AF9-driven mouse 
model of acute myeloid leukaemia (AML), co-expressing a reverse tetracycline 
transactivator (“Tet-on competent’), was generated as previously described”. Six- 
to eight-week-old C57BL/6 (‘wild type’) female mice were used as recipients for 
in vivo lymphoma or leukaemia propagation. No randomization or blinding was 
used to allocate experimental groups. 

The use of tumour biopsies (that is, bone marrow aspirates, lymph-node biopsies 
or peripheral blood samples obtained for the initial diagnosis or follow-up analyses 
of patients with B-cell leukaemia (B-CLL), diffuse large B-cell lymphoma (DLBCL) 
or acute myeloid leukaemia (AML)) as anonymous samples after informed 
patient consent was approved by the local ethics commission of the Charité - 
Universitatsmedizin Berlin (reference EA4/085/07 and EA4/061/11). 

Cell culture, plasmids and retroviral gene transfer. Isolated mouse lymphoma 
cells and primary human AML samples (tumour-cell-purified by Ficoll density- 
gradient centrifugation and red cell lysis) were short-term cultured in stand- 
ard medium on irradiated NIH3T3 fibroblast feeders*®. Primary human B-cell 
malignancies were cultivated in a ‘CD40 system’’, that is, in the same medium 
further supplemented with 100 IU ml"! of recombinant human interleukin-4 
(Peprotech) on irradiated NIH3T3 cells stably expressing the human CD40 ligand. 
Human cancer cell lines were obtained from DSMZ (Leibniz-Institut Deutsche 
Sammlung von Mikroorganismen und Zellkulturen GmbH), ATCC or Biomol: 
RCK8 (DSMZ; ACC-561), Eheb (DSMZ; ACC-67), K562 (DSMZ; ACC-10), Mec1 
(DSMZ; ACC-497), Molm13 (DSMZ; ACC-554), SW480 (DSMZ; ACC-313), 
LS174T (DSMZ; ACC-759), DLD-1 (DSMZ; ACC-278), Caco-2 (DSMZ; ACC- 
169), SKMel28 (ATCC; HTB-72), MeWo (ATCC; HTB-65), WM266.4 (Biomol; 
WM266-4-01). Omm2.3 cells were provided by Martina J. Jager. The cells were 
cultivated according to the supplier’s recommendations and regularly tested for 
mycoplasma contamination. The cell lines bought within last four years were not 
additionally authenticated (RCK8, Eheb, Mec-1). All other cell lines were authenti- 
cated by DSMZ using a single-nucleotide polymorphism-based multiplex approach 
in October 2017. Single-nucleotide polymorphism profiles matched known pro- 
files or were unique (Omm2.3). Retroviral supernatants, generated by transient 
transfection of Phoenix-Eco packaging cells with murine stem-cell retrovirus 
(MSCV)-based constructs, were used to stably infect Eji-Myc transgenic lympho- 
mas, Kras°!??;shp53-GFP T-ALL cells, Nras@!22/MLL-AF9 AML cells or human 
cancer cell lines (engineered to express the ecotropic virus receptor as described”). 
Freshly isolated cells were first infected with an MSCV retrovirus encoding murine 
or human Bcl2 and a blasticidin antibiotic resistance gene. Bcl2-overexpressing 
Eu-Myc;Suv39h1~ lymphoma were subsequently infected with Suv39h1-ER™ 
cDNA, encoding murine full-length Suv39h1, fused in frame with the coding 
sequence of an 4-OHT-inducible oestrogen receptor mutant (ER!*; see ref. 39), 
subcloned into MSCV-IRES-GFP or MSCV-IRES-DsRed vectors. GFP- or DsRed- 
positive cells were purified in a fluorescence-activated cell sorter (FACS Aria II, BD 
Biosciences). TOPflash and FOPflash reporter constructs (reflecting the wild-type 
or mutant TCF-binding promoter region followed by a firefly luciferase-encoding 
cDNA) were subcloned from the original pGL3 vector into a self-inactivating 
MSCVsn-DsRed plasmid, stably transferred into mouse lymphoma cells or 
human cell lines (expressing the ecotropic virus receptor), and flow-sorted for 
DsRed-positive cells. NF-«B inactivation was achieved by stable overexpression of 
an IkBaAN construct (NF-«B super-repressor (NF-KB-SR)) in control;Bcl2 cells 
as reported previously*’. Wnt pathway activation was achieved by transducing 
control;Bcl2 lymphomas with a stabilized murine 6-catenin (encompassing 
an N-terminal 90-amino acid deletion, ANB-catenin)-encoding MSCV-IRES- 
GFP retrovirus. To stably knock down /(3-catenin expression, a previously 
published shRNA sequence! was subcloned into the pSuperRetro plasmid to infect 


Suv39h1~;Bcl2;Suv39h1-ER™ cells. An MSCVgy-based construct containing a 
miR30-shRNA against murine p53 under a tetracycline-dependent promoter was 
used to transfect Nras@!?2/MLL-AF9;Bcl2 cells. Stable TP53 knockdown in human 
cell lines RCK8, Molm-13 and LT174T was achieved by lentiviral transduction 
with a previously published shRNA against p53 (ref. 43) in the pLKO.1-puro vector 
(Addgene plasmid 19119). 

In vitro and in vivo treatments. For the induction of cellular senescence in vitro, 
Adriamycin (ADR; Sigma), a topoisomerase II inhibitor widely used in the clinic 
to treat lymphomas and other malignancies, was added once at a concentration 
of 0.05 1g ml“? in all experiments, with the following exceptions: Eheb, Mecl, 
Molm13 and RCK8 cell lines, treated with 0.01 jig ml! ADR, and the K562 cell line, 
treated with 0.025 1g ml“! ADR. For conditional activation of ER™™- or ER™-fused 
constructs, the cells were additionally exposed over five days to 11M of 4-OHT 
(Sigma) or the equivalent volume of the ethanol-based solvent. Cellular senescence 
was assessed after five days of treatment. Pharmacological inhibition of the Wnt 
pathway or kinases involved in modulating Wnt signalling was performed by add- 
ing small molecule inhibitors to cells for the final 48 h of the senescence-inducing 
ADR+4-OHT treatment: Wnt inhibitors ICG-001 (10|1M; Enzo Life Sciences) 
and salinomycin (11M; Sigma), MAPK inhibitor PD325901 (10 nM; Selleckchem), 
MEK inhibitor PD98059 (251M; Selleckchem), PI3K inhibitor LY294002 (101M, 
Sigma-Aldrich), Akt inhibitor MK-2206 (200 nM, Selleckchem) or GSK3( inhi- 
bitor CHIR99021 (141M; Sigma-Aldrich). For Wnt-modulating treatments upon 
senescence-release, passage-2 never senescent and previously senescent cells were 
used (that is, ADR + 4-OHT-pretreated Ep-Myc; Suv39h1-;Bcl2;Suv39h1 -ER!? 
cells, further propagated in 4-OHT/ADR-free medium for 14 days). Matched pairs 
of previously senescent and never senescent cells were exposed to Wnt inhibitors 
as described above, or to recombinant mouse Wnt3a (10 ng ml!, R&D Systems), 
recombinant mouse R-Spondin 2 (Rspo2; 20 ng ml~!, R&D Systems), a combina- 
tion of the two ligands (at the same concentration as for single treatments) or to 
the GSK3( inhibitor CHIR99021 (1|1M, Sigma-Aldrich) for 48 h regarding the 
gene expression analysis or for seven days (in methylcellulose medium) regarding 
colony formation assessment. The doxycycline (DOX)-dependent activation of an 
shRNA against p53 in mouse Kras@!”?;shp53-GFP T-ALL or Nras©!??/MLL-AF9 
AML samples was achieved by supplementing the culture medium with 11g mI! 
of doxycycline (Sigma). 

For in vivo experiments, 1 x 10° Eju-Myc;Suv39h1~;Bcl2;Suv39h1-ER™ lym- 
phoma or 5 x 10° Kras°!??;shp53-GEP T-ALL leukaemia cells (or 1 x 10° 
Lineage” (Lin ) cells as a positive control), if not otherwise indicated, were trans- 
planted by tail-vein injection into immunocompetent recipient mice. In case of 
Kras©?>;shp53—GFP T-ALL leukaemia samples, recipient mice were irradiated 
with 6 Gy, 24h before transplantation. DOX was supplied with the drinking water 
(20 mg ml‘; exchanged twice a week) and in food pellets (200 mg kg of regu- 
lar chow). Leukaemia manifestation was diagnosed by flow cytometry-based 
detection of GFP-positive cells in the peripheral blood at the time mice presented 
with general signs of pre-terminal sickness (greater than 20% weight loss or other 
symptoms of severe sickness). If no signs of sickness were noted, the experiments 
were ended by 70% tumour burden in peripheral blood. Lymphoma formation was 
diagnosed when palpable lymph-node enlargements had formed. A tumour size 
of 16mm (corresponding to approximately 4 lymph nodes of 4mm in diameter) 
was approved by Landesamt Berlin as an experiment end-point criterion and was 
not exceeded in any of the performed experiments. ICG-001 and salinomycin 
were applied intraperitoneally daily (both at a dose of 10 mg kg~! body weight), 
starting from palpable lymphoma formation until a pre-terminal disease stage was 
reached. Time-to-death was defined as the latency between transplantation and 
a pre-terminal disease stage. Upon CO euthanasia, single-cell suspensions were 
isolated from enlarged organs as described previously*”*. 

Analysis of growth parameters, viability, stem-cell and senescence markers. Cell- 
cycle analysis by 5-bromo-2’-deoxyuridine/propidium iodide (BrdU/PI)-based 
flow cytometric measurement was performed as described previously**. Cytospin 
preparations of suspension cultures for subsequent SA-3-gal analyses or immunos- 
tainings were carried out as described previously***“*. Carboxyfluorescein succin- 
imidy] ester (CFSE) labelling was performed on day 3 after starting ADR + 4-OHT 
treatment, using the CellTrace Far Red Cell Proliferation Kit for flow cytometry 
(Molecular Probes, C34564) according to the manufacturer’s recommendations. 
CFSEbigh cells were sorted on treatment day 5 on an S3e Cell Sorter (Bio-Rad). 
For 8-catenin co-staining, CFSE-labelled cells were fixed in 4% paraformalde- 
hyde, permeabilized by Saponin in 1% bovine serum albumin (LifeTechnologies, 
10635), stained with Alexa Fluor 488 mouse anti-(}-catenin antibody according to 
the manufacturer’s recommendations (BD Pharmingen, 562505), and acquired 
on an ImageStreamX Mark II Imaging Flow Cytometer (Amnis, MerckMillipore). 
EdU labelling was performed on treatment day 5 using the Click-iT EdU Pacific 
Blue Flow Cytometry Assay Kit according to the manufacturer's recommenda- 
tions (Molecular Probes, C10418). For the fluorescent SA-3-gal labelling, cells 
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were incubated in 751M chloroquine solution for 1h followed by exposure to the 
C12FDG substrate (5-dodecanoylaminofluorescein-di-8-p-galactopyranoside; 
ImaGene Green C12FDG lacZ Gene Expression Kit, Molecular Probes, 12904) for 
20 min at 37°C in PBS (pH 5.5, with 1 mM MgCl.) and analysed on ImageStreamX 
Mark II Imaging Flow Cytometer. Cell viability was evaluated by annexin V 
(BD Pharmingen, 556419) and propidium iodide (5 p.g ml}, Sigma-Aldrich) 
staining, analysed in a FACSCalibur flow cytometer (BD Biosciences). Viable 
cells were detected as annexin V/propidium iodide-double-negative. ABC trans- 
porter activity was analysed using the eFluxx-ID Gold multidrug resistance kit 
(Enzo Life Sciences), and ALDH activity using the ALDEFLUOR kit (StemCell 
Technologies)*°, according to the manufacturer’s instructions. Colony-forming 
unit assays were performed by plating 10” or 10° cells in 1 ml of methylcellulose 
medium (MethoCult M3134 for mouse cells, or H4100 for human cells, Stem Cell 
Technologies). For mouse cells, the medium was supplemented with recombinant 
murine interleukin (IL)-3 (Ing ml~ 1 Miltenyi), recombinant murine IL-6 (10 ng ml a 
Miltenyi), recombinant murine IL-7 (0.1 ng ml~!, Peprotech), and recombinant 
murine stem-cell factor (SCE, 50 ng ml~!, Peprotech). For the indicated assays, 
the medium was further supplemented with ADR (0.05 1g ml~!), 4-OHT (141M), 
DOX (1pg ml“), ICG-001 (101M), salinomycin (1 1M)*”“8, Wnt3a (10ng ml~!), 
Rspo2 (20 ng ml!) or GSK3( inhibitor CHIR99021 (141M). Clusters of greater 
than 50 cells were scored as colonies, using bright-field or fluorescence micros- 
copy. For serial passaging, cells were washed out of methylcellulose with warm 
PBS after seven days (mouse B-cell lymphoma cells) or ten days (mouse T-ALL 
cells), counted and plated in fresh methylcellulose medium (10? or 10° cells per 
ml). Regarding luciferase-based Wnt reporter assays, cells stably transfected with 
TOPflash-MSCVgin or FOPflash-MSCVgin were ADR-exposed in a senescence- 
inducing schedule as described above. The luminescence signals were measured 
with the ONE-Glo kit (Promega) according to the manufacturer's instructions and 
normalized to viable cell counts. For depletion of Lin™ cells from Kras@!”?;shp53- 
GFP T-ALL samples or Nras°!??/MLL-AF9 AML samples, cells were labelled with 
a cocktail of biotinylated lineage marker antibodies (BD Biosciences, 559971) 
followed by Streptavidin-PE (BD Biosciences, 554061). GFP*PE* cells were flow- 
sorted in a FACS Aria II (BD Biosciences). For depletion of CD34* cells from 
Bcl2-transfected Molm-13 cell line, cells were stained with a directly conjugated 
anti-CD34-APC antibody (1:200, BD Biosciences, 560940), and CD34° cells were 
sorted in a FACS Aria II (BD Biosciences). 

RNA-based expression analysis. For microarray-based gene expression profiling 
of untreated or five-day-ADR-exposed control;Bcl2 or Suv39h1-;Bcl2 lymphomas, 
RNA was isolated and processed as previously reported””. 

The list of 5,401 probe sets differentially expressed between untreated and ADR- 
treated control;Bcl2 lymphomas was determined by analysis of variance (ANOVA, 
cut-off at q< 0.05). The list of filtered genes was ranked according to expression 
fold changes, and the genes belonging to the ATSC” or core embryonic stem-cell 
signature” were marked in orange and blue, respectively. 

Gene set enrichment analysis (GSEA) was performed with the GSEA v2.0 
software (Broad Institute of MIT (Massachusetts Institute of Technology) and 
Harvard, http://www.broad.mit.edu/gsea)°° on transcriptome data produced in 
our laboratory (GSE31099 and GSE44355) or on publicly available transcriptome 
datasets downloaded from the Gene Expression Omnibus (GEO; https://www. 
ncbi.nlm.nih.gov/geo/): normal colon epithelium and colon adenomas from 
ApeMi"’+ mice (GSE422, samples GSM6191-GSM6201), Braf-V600E-infected 
human melanocytes (GSE46801), human mammary epithelial cells in p16'N***- 
dependent stasis or telomere shortening-induced agonescence (GSE16058), 
normal human foreskin BJ fibroblasts in replicative senescence (GSE13330, sam- 
ples GSM336385-GSM336628) and normal human mesenchymal stem cells in 
replicative senescence (GSE9593, samples GSM242185, GSM242668, GSM242669 
and GSM242672-GSM242674). Probed gene sets were taken without further 
change from the indicated publications, downloaded from the Molecular Signature 
Database (MsigDB) of the Broad Institute (http://software.broadinstitute.org/gsea/ 
msigdb/collections.jsp) or from the Gene Ontology (GO) browser AmiGO (‘GO 
Cell cycle process’ (GO:0022402), GO “Wnt signaling pathway’ (GO0016055), 
GO ‘Canonical Wnt receptor signaling’ (GO:0060070), GO ‘Noncanonical Wnt 
signaling’ (GO0035567), GO ‘Notch signaling pathway’ (GO0007219), GO 
‘Smoothened signaling pathway’ (GO0007224)), or generated from the gene list 
reflecting the Mouse Wnt Signalling Pathway PCR Array (SA Biosciences; genes 
from this list annotated to have a role in cell growth and proliferation were used as a 
separate gene set, http://www.sabiosciences.com/rt_pcr_product/HTML/PAMM- 
043A.html#function). Normalized enrichment scores (NES) with P values <0.05 
and false discovery rates (FDR) <0.25 were considered statistically significant. 

For quantitative reverse-transcriptase PCR analyses of stem-cell-related genes 
in lymphoma cells, RNA extracted with Trizol (Invitrogen) was transcribed into 
cDNA using SuperScript II reverse transcriptase (Invitrogen). A panel of estab- 
lished stem-cell-related markers consisting of mouse Abcg2, Cebpb, Kit, Cd34, 
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Cd44, Prom1 (also known as Cd133), Slamf1 (also known as Cd150), KIf4, Ly6a 
(also known as Scal) or human ABCG2, CD34, CD44, PROM1 (also known as 
CD133), SLAMFI (also known as CD150), LGRS, a panel of Wnt signalling targets: 
Ccnd1, Fosl1, Fzd3, Id2, Met, as well as a panel of established mouse SASP factors: 
Igfbp6, Ccl2, Ccl20, Cxcl1, Ctgf, 116, Kitl and Tnfa were analysed by qPCR using 
commercially available Taqman assays (Applied Biosystems). Transcript quanti- 
fication was calculated as 24 based on AC,= ACi(treated) — ACt(untreated)s With 
GAPDH transcript levels as an internal control. 

Protein-based expression analyses. Immunophenotyping by flow cytome- 
try was carried out as described previously*”’, using the primary antibodies 
directed against human CD34 (BD Biosciences, 560940, 1:200), human CD33 
(BD Biosciences, 555450 1:200), or against mouse antigens: H3K9me3 (Abcam, 
ab8898, 1:2,000), 3-catenin (eBiosciences, 50-2567, 1:20), Thy1.2 (BD Biosciences, 
553005, 1:200), TdT (Miltenyi, 130-100-749, 1:10), Kit (BD Biosciences, 553355, 
1:200), Scal (BD Biosciences, 557404, 1:200), followed by secondary antibodies: 
anti-rabbit AlexaFluor 594 (Invitrogen A21207, 1:200) and Streptavidin- APC (BD 
Biosciences, 554067, 1:2,000). 

For immunoblotting analyses, whole-cell pellets were lysed in Laemmli sample 
buffer (60 mM Tris-HCl at pH 6.8, 10% glycerol, 2% SDS, 5% 2-mercaptoethanol) 
supplemented with protease and phosphatase inhibitors, resolved by electro- 
phoresis on a 12% SDS polyacrylamide gel (SDS-PAGE), transferred onto an 
Immobilon-P membrane (Millipore) and probed using antibodies against total 
8-catenin (BD Biosciences, 610153, 1:200), active 3-catenin (dephosphorylated at 
serine 37 (Ser37) and threonine 41 (Thr41); Millipore, 05-665, 1:1,000), H3K9me3 
(Abcam, ab8898, 1:2,000), total Erk (Cell Signaling Technology (CST), 9102, 
1:1,000), phospho-Erk1/2 (that is, Erk1/2 phosphorylated at Thr202 and Tyr204; 
CST, 4376, 1:1,000), total Akt (CST, 9272, 1:1,000), phospho-Akt (that is, Akt-P- 
Ser473; CST, 4060, 1:2,000), total GSK36 (CST, 12456, 1:1,000), phospho-GSK38 
(that is, GSK3B8-P-Ser9; CST, 5558, 1:1,000) and «-tubulin (Sigma, T5168, 1:500) 
as a loading control. 

For immunofluorescence, cells were fixed in 4% paraformaldehyde, perme- 
abilized with 0.1% Triton X-100/PBS, blocked in 1% bovine serum albumin 
supplemented with the anti-mouse Cd32/Cd16 antibody (BD Biosciences, 53142, 
1:50) and incubated with a primary antibody against total 3-catenin (1:200), fol- 
lowed by 0.01% Tween 20 as detergent buffer and Alexa Fluor 594 (Invitrogen 
A11008, 1:5,000) as a secondary anti-mouse IgG antibody. The slides were 
stained with 4,6-diamidino-2-phenylindole (DAPI, Biolegend, 422801, 1:1,000 
in PBS) as a nuclear counterstain, and mounted with Mowiol 4-88 (Calbiochem). 
Immunohistochemistry was performed on formalin-fixed, paraffin-embedded 
lymph-node sections as described previously**. Cryo-sections of mouse lymph- 
nodes were stained with an fluorescein isothiocyanate-conjugated antibody against 
total 8-catenin (BD Biosciences, 562505, 1:200), and human DLBCL sections were 
stained with a primary antibody against total 3-catenin (BD Biosciences, 610153, 
1:200), followed by a secondary anti-mouse IgG antibody (1:1,000, Dako REAL 
Detection System (labelled streptavidin-biotin), Dako, K5005). 

Global proteome analysis. Suv39h1~;Bcl2;Suv39h1-ER™ cells were sampled in 
ice-cold methanol after five days of ADR+ 4-OHT treatment. 501g of the protein 
extracts were digested using an xt-PAL (CTC Analytics) pipetting robot with the 
Chronos software package (Axel Semrau), reduced with 1 mM tris(2-carboxyethyl) 
phosphine. Free sulfhydryl groups were carbamidomethylated using 5.5 mM cholo- 
roacetamide. The proteins were digested using 0.5 1g sequencing-grade endopepti- 
dase LysC (Wako) for 3h at room temperature, and subsequently diluted with four 
volumes of 50mM ammonium bicarbonate. Tryptic digestion occurred over 10h at 
room temperature using 1 jg of sequencing-grade trypsin (Promega). The reaction 
was stopped by adding trifluoroacetic acid to a final pH of 2. The peptides were 
purified using C18-stage tips (3M)°*". By applying the dimethyl labelling technique, 
the untreated lymphoma samples, serving as the reference, were ‘light -labelled, 
whereas others (ADR + 4-OHT-treated) were ‘heavy’-labelled, on the xt-PAL 
machine by automatically adding 4:1 light (+28 Da) or heavy (+32 Da) formal- 
dehyde and 4 1l cyanoborohydride to a final concentration of 0.8%. The reaction 
was carried out overnight, quenched by 16,11 of 50 mM ammonium bicarbonate 
buffer and acidified by 8 11 50% trifluoroacetic acid. The ‘heavy’- and ‘light’-labelled 
samples were mixed in a 1:1 ratio and measured as technical duplicates on a 
Q-Exactive mass spectrometer (Thermo Fisher) coupled to a Proxeon nano-LC 
system (Thermo Fisher) in data-dependent acquisition mode, selecting the top 
ten peaks for higher-energy collisional dissociation fragmentation. A three-hour 
gradient (solvent A: 5% acetonitrile, 0.1% formic acid; solvent B: 80% acetonitrile, 
0.1% formic acid) was applied to the samples using a custom-made nano-LC column 
(0.075 mm x 250mm, 3\1m Reprosil C18, Dr. Maisch GmbH). The peptides were 
eluted in gradients of 4 to 76% acetonitrile and 0.1% formic acid in water at flow 
rates of 0.25 1 min~!. Mass spectrometric acquisition was performed at a reso- 
lution of 70,000 in the scan range of 300 to 1,700 m/z. Dynamic exclusion was set 
to 30s and the normalized collision energy to 26 eV. For the automatic interpre- 
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tation of the recorded spectral data, the MaxQuant software version 1.2.2.5 (Max 
Planck Institute) was used, with a multiplicity of 2 for dimethyl labelling’. An 
FDR of 0.01 was applied on the peptide and the protein level, and an Andromeda- 
based search was performed using a mouse International Protein Index database 
(ipi. MOUSE.v3.84.fasta). Mass spectrometric measurement data were log- 
transformed regarding the heavy/light ratios using the R-statistical software 
(R Foundation for Statistical Computing). Three replicates were used to calculate 
mean values and significance levels using the Wilcoxon test. All identifications with 
a —log)o-transformed P value >1 were considered significant. 

Chromatin immunoprecipitation. Chromatin immunoprecipitation was per- 
formed according to Young and colleagues*! with minor modifications. 1 x 107 
cells were fixed for 20 min in a 1% formaldehyde solution. The fixation was stopped 
with 0.1 M glycine, the cell pellet was lysed and sonicated in 30011 buffer LB3™* 
(Bioruptor Sonicator, two cycles of 15 min each at high power in pulsed mode (30s 
on, 30s off)). 30 jl of 10% Triton X-100 was added and the sample was centrifuged 
at 13,000 rpm for 10 min at 4°C. The supernatant was removed and an aliquot was 
retained as the input DNA sample. For immunoprecipitation, 1401 of the superna- 
tant was mixed with 50 11 of Dynabeads Protein G (Life Technologies/Invitrogen), 
pre-coated with 51g of an H3K4me3 antibody (A5051-001P, Diagenode) or an 
H3K27me3 antibody (39155, Active Motif) and incubated at 4°C overnight. 
After incubation, the beads were magnetically separated from the supernatant, 
washed and eluted. After reverse-crosslinking, RNaseA and proteinase K diges- 
tion®4, the DNA was extracted with phenol/chloroform, and used as a template 
for qPCR. Sequence information of the specific primers used is available upon 
request. Enrichments were calculated according to the AAC, method, with Prame 
as endogenous control, and the input as calibrator. The values of the relative enrich- 
ments for the 4-OHT/ADR-treated samples were divided by the corresponding 
ADR sample values. 

Statistical evaluation. On the basis of previous experience with the Eu-Myc trans- 
genic mouse lymphoma model, sample sizes typically reflect three to five individual 
primary tumours as independent biological replicates. All quantifications from 
staining reactions (for example, immunostainings or SA-}-gal assays) reflect at 
least three samples with at least 100 events counted (typically in three different 
areas) each. For assessing long-term outcome after in vivo treatments, six or more 
tumour-bearing animals per arm were used. No statistical method was used to pre- 
determine sample size. No data were excluded, all probes/animals that met proper 
experimental conditions were included in the analysis. For purposes of tumour- 
initiation assays, a transplanted mouse scored positive if a palpable lymphadenop- 
athy developed at any time point during the observation period of 100 days. The 
tumour initiation data were analysed using the ELDA (Extreme Limiting Dilution 
Analysis) software package at http://bioinf.wehi.edu.au/software/elda/ (ref. 55) 
with a confidence interval of 95%. Unless stated otherwise, data are presented as 
arithmetic means + standard deviation (s.d.) and statistical analyses were based on 
paired or unpaired two-sided t-tests. The data not following a normal distribution 
(by Kolmogorov-Smirnoy test) were analysed by unpaired t-test with Welch's cor- 
rection. Similar variance between groups was not assumed. P < 0.05 was considered 
statistically significant. The whisker plot boxes indicate the first and third quartiles 
with median, and the upper and lower bars minimum and maximum values. For 
GSEA, the non-parametric Kolmogorov-Smirnov test was applied. Significant 
enrichment was accepted when P< 0.05 and FDR < 0.25, thus using the default 
significance levels for the method. 

Data availability. Microarray datasets produced in our laboratory and analysed 
in this study are available at the Gene Expression Omnibus (GEO) repository of 
the National Center for Biotechnology Information, under the accession numbers 
GSE31099 and GSE44355, for control;Bcl2 and Suv39h1~;Bcl2 lymphomas, respec- 
tively. Source Data for Figs 1-4 and Extended Data Figs 1-10 are provided with 
the online version of this paper. All other datasets generated during this study are 
available from the corresponding author on reasonable request. 
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Extended Data Figure 1 | Senescent cells of mouse and human origin 
present with enhanced stem-cell markers and functionalities. 

a, 5,401 probe sets (corresponding to 3,867 genes) differentially expressed 
in TIS were determined from the transcriptome data comparing untreated 
and ADR-senescent primary control;Bcl2 lymphomas by two-way 
ANOVA adjusted for multiple testing (cut-off q < 0.05, n= 12 biologically 
independent samples). 181 out of 737 genes belonging to an ATSC’” 

or 43 out of 337 genes of core embryonic stem-cell (ESC) signature’? 

were detected and marked orange and blue, respectively, in the fold- 
change-ranked gene list. Whereas the expression of core embryonic 
stem-cell genes was not correlated with senescence, ATSC transcripts 
exhibit a strong association with TIS. b, Senescence-selective gene set 
enrichment pattern of proliferation- and stem-cell-related gene modules 
(including haematopoietic stem cell (HSC) and long-term HSC (LT-HSC) 
signatures)°°°*® in control;Bcl2 and Suv39h;Bcl2 lymphoma cells as 

in Fig. la. GSEA based on the Kolmogorov—Smirnov test, with negative 
NES indicating enrichment in untreated lymphomas, and positive NES 
reflecting enrichment in TIS. n= 12 biologically independent control;Bcl2 
samples and n=5 Suv39h ;Bcl2 samples. NES of P< 0.05 are considered 
statistically significant and are shown in red. c, Senescence induction by 
ADR treatment in various human cell lines consisting of haematological 
malignancies, colorectal cancers, melanomas, or in primary samples from 
patients with B-CLL as determined by SA-{-gal staining (mean percentage 
of positive cells +s.d.,n =3 independent experiments for cell lines; n = 4 
individual B-CLL samples). TIS-competent cells are defined by a greater 
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than fourfold induction of SA-6-gal-positive cells (with the exception 

of B-CLL samples, in which SA-3-gal-positive cells were at least 

threefold induced), and depicted as a blue box symbol in Fig. Ic. 

d, ABC transporter activity in cells as in Fig. la, measured by the efflux 

of a fluorescent substrate with and without the ABC transporter inhibitor 
verapamil. Representative plots of four independent lymphomas tested 

per genotype. e, Enhanced expression of the stem-cell marker CD34 in 

the RCK8 cell line or primary human B-cell leukaemia samples exposed 

to ADR treatment in vitro. Mean fluorescence intensity + s.d. from three 
independent experiments (RCK8 cells) and five individual leukaemia cases 
determined by flow cytometry. Two-tailed, unpaired t-test with Welch’s 
correction, *P< 0.05. f, TIS-mediated increase and verapamil-dependent 
blockage of ABC transporter activity in ADR-senescent RCK8 cells and 
primary human B-cell leukaemia samples as in e. One representative out of 
three independent experiments shown. g, SAS occurring in non-malignant 
senescence scenarios: GSEA of proliferation- or stem-cell-related gene sets 
(as in b) in publicly available transcriptome data representing different 
models of replicative senescence: primary human mammary epithelial 
cells in stasis or agonescence (GSE16058, 12 prestasis, 9 stasis and 4 
agonescence individual biological samples), high-passage BJ human skin 
fibroblasts (GSE13330, n =6 pairs of proliferating/senescent cells from 
individual donors) or high-passage primary human mesenchymal 

stem cells (GSE9593, n =3 pairs of proliferating/senescent cells from 
individual donors). 
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Extended Data Figure 2 | Genetic, biochemical and functional 
properties of regulatable senescence models. a, Graphic illustration 

of the model system engineered to stably express a regulatable senescence- 
essential gene moiety, such as Suv39h1~ proficient and -deficient 
Ey-Myc transgenic and Bcl2-infected lymphoma variants of which 

only Suv39h1-;Bcl2;Suv39h1 -ER™ cells regain conditional TIS 

capability if exposed to 4-OHT. b, Relative transcript levels of the 
indicated stem-cell-related and Wnt target (asterisk) genes by qPCR in 
Suv39h1—;Bcl2;Suv39h1-ER'? lymphoma cells exposed to the indicated 
treatments for five days. Results represent mean fold induction relative to 
the untreated condition + s.d. (n = 3 biologically independent samples). 
c, Global proteome analysis of total Suv39h1 ~;Bcl2;Suv39h1-ER” cell 
lysates after five days of ADR + 4-OHT treatment, showing mean protein 
expression changes relative to untreated condition (x axis) and their 
statistical significance (y axis), n = 3 biologically independent samples 
analysed by Wilcoxon test. All identifications with a —logio transformed 
P value greater than 1 were considered significant. Dots representing 
ATSC factors are highlighted in orange. d, Immunoblot of H3K9me3 
expression in Suv39h1-;Bcl2;Suv39h1 -ER” lymphoma cells treated 

as in b (‘treatment’), and monitored at the indicated passages in 
4-OHT/ADR-free medium (‘post-treatment’?; p1-3, each passage reflects 
7 days in culture). Never senescent, ADR-only- and previously senescent 
ADR+4-OHT-pretreated lymphoma cells are analysed, «-tubulin is 


used as a loading control. One out of two independent experiments 
shown. For gel source data, see Supplementary Fig. 1. e, f, Growth curve 
analysis (e) and SA-6-gal reactivity time course (f) of cells treated as in 

d. Results represent mean cell numbers or percentages of positive cells, 
respectively + s.d., from three biologically independent samples. 

g, Kinetics of the proliferation marker EdU and the fluorescent SA-8-gal 
marker in Suv39h1~;Bcl2;Suv39h1-ER'™? lymphoma cells after five days 

of ADR + 4-OHT treatment (‘treatment’), and subsequent passages in 
4-OHT/ADR-free medium (‘post-treatment, p1-3, each passage reflecting 
seven days in culture), demonstrating outgrowth of senescent (SA-6-gal*) 
cells after terminating the 4-OHT/ADR treatment. Mean percentages 

of EdU*/SA-B-gal* and EdU*/SA-6-gal™ cells +s.d., 1 = 4 biologically 
independent samples. Representative photomicrographs from cell 
populations marked by red circles are shown in Fig. 2a. h, Competition 
assays of matched passage 2 previously senescent (GFP-labelled) and never 
senescent (DsRed-labelled) lymphomas plated at an equal ratio (top) and 
evaluated by fluorescence microscopy-scored colony formation in vitro 
(bottom left), and by flow cytometric analysis of lymphoma cells isolated 
from manifest tumours after transplantation (bottom right). Numbers 
reflect the ratio of red- to green-fluorescent colonies or cells, respectively. 
One representative out of four independent experiments shown, including 
colour reversal. 
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Extended Data Figure 3 | Senescence-released (previously senescent) 
cancer cells display higher tumour-initiating capacity than their 
never-senescent counterparts. a~d, Growth properties of conditionally 
senescent lymphoma cells analysed as in Fig. 2a, b, but using p53- 

ER™™; Bc]2 lymphoma cells with ADR + 4-OHT treatment (a, b), or 
Suv39h1~;Bcl2;Suv39h1-ER" lymphoma cells exposed to a single dose 
of \-irradiation (8 Gy) instead of ADR, followed by five days of 4-OHT 
treatment and subsequent passaging in 4-OHT-free medium (c, d). Results 
presented as mean positive cells or mean colony numbers + s.d.;n=4 

(a, c, d) or n= 3 (b) biologically independent samples. Representative 
photomicrographs from one out of three independent experiments 

(a, c). Two-tailed, unpaired t-test with Welch's correction, comparing 
ADR- and 4-OHT+ADR pretreated lymphomas at p6, or 8 Gy- and 
4-OHT-+8 Gy at p5. *P < 0.05 (b, d). It is noteworthy that the superior 
growth and clonogenicity of post-senescent cells can be explained neither 
by rare cells that may simply have bypassed senescence, because the 
matching never senescent (that is, senescence bypasser) group presented 
with inferior clonogenicity, nor by an enhanced death rate of non-stem 
cells in the Suv39h1-proficient aliquot, because no significant differences 
in viability were observed between never senescent and previously 
senescent groups throughout these experiments. Viability determined by 


flow cytometry as thepercentage of annexin V/PI double-negative cells 
was typically greater than 80% and comparable between never senescent 
and previously senescent cells (not shown; the same applies for Figs 2a 
and 4a). Growth-promoting mutations are also unlikely, as senescent cells 
stopped replicating their DNA. e, f, Colony formation assay of untreated 
versus five-day-ADR-senescent human RCK8 lymphoma cells (e) or 
LT174T colon carcinoma cells (f) that were exposed to a shp53-lentivirus 
or mock infection on day five of ADR treatment, with p53 knockdown 
enabling outgrowth out of fully established senescence. As observed for 
mouse lymphoma cells, post-senescent RCK8 and LT174T cells, after 

just three passages, outperformed the clonogenic potential of tumor cells 
that were equally exposed to shRNA against p53 but never experienced 
senescence. Results represent mean colony numbers at indicated passages 
(each reflecting seven days in ADR-free methylcellulose medium) + s.d., 
n=3 independent experiments. Two-tailed, unpaired t-test with Welch's 
correction, comparing untreated shp53 versus ADR + shp53 at p5 (e) or 
p4 (f). *P < 0.05. g, TIS re-inducibility in Suv39h1~;Bcl2;Suv39h1-ER™ 
previously senescent cells (at passage 2, compare with Fig. 2a) re-exposed 
to 4-OHT and ADR for five days, as detected by SA-(-gal staining (up) 
and BrdU/PI incorporation (down). Results represent mean percentages of 
positive cells + s.d. (n= 4 independent lymphomas). 
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Extended Data Figure 4 | The senescence-associated secretory (bottom) by qPCR. Results represent mean fold induction relative to 
phenotype (SASP) is dispensable for senescence-associated stemness mock-transduced untreated cells + s.d. (n= 4 biologically independent 
(SAS) induction. a, Expression of a panel of SASP transcripts*°*” by qPCR samples). c, Co-expression of the stem-cell marker Scal and the TIS 
in Suv39h1-regulatable lymphoma cells after five days of ADR + 4-OHT marker H3K9me3 by flow cytometry in NF-kB-SR-expressing control;Bcl2 
exposure, and after two passages in 4-OHT/ADR-free medium (that cells exposed to ADR for five days, indicating uncompromised SAS 
is, in never senescent and previously senescent cells), showing SASP induction. Percentages indicate mean Scal/H3K9me3 double-positive 
upregulation in TIS and its downregulation back to baseline levels in cells + s.d. (n = 4 biologically independent samples). d, ABC transporter 
senescence-released previously senescent cells. Results represent mean activity by flow cytometry in control;Bcl2;NF-«B-SR cells as in c, again 
fold induction relative to untreated lymphomas + s.d. (n = 3 biologically demonstrating strong induction of stem-cell-reminiscent ABC transporter 
independent samples). b, Blunting SASP production (top) by NF-KB activity in TIS cells (compare with Extended Data Fig. 1d) irrespective of 
super-repressor IK BaAN (NF-KB-SR)-mediated genetic inhibition of their blunted SASP response. Representative plots out of four independent 


NF-KB as the major SASP driver in TIS cells (without compromising their lymphomas shown. 
ability to enter TIS)”“° did not prevent acquisition of stemness markers 


© 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. 


LETTER 


a b control; Bcl2 Suv39h1-; Bcl2 
control;Bcl2 Suv39h1-;Bcl2 GO Canonical Wnt signaling 
P NES P P =0.577 
—> |WNT_CELLGROWTH AND PROLIFERATION 1.634 0.014 | 1.247 0.163 
IGO NOTCH SIGNALING PATHWAY 1.529 0.001 | -0.953 0.570 
GO WNT SIGNALING PATHWAY 1.441 0.000 | -0.928 0.684 IID AN 
—» |GO CANONICAL WNT SIGNALING 1.438 0.004 | -0.957 0.577 
WNT_STANFORD 1.377 0.047 | -0.920 0.614 Selected Wnt targets, Growth and proliferation‘ 
IST_WNT_BETA_CATENIN_PATHWAY 1.368 0.081 | -0.910 0.614 ) mean | 
WNT_SIGNALING_PCR_ARRAY 1.331 0.067 | -0.951 0.569 i aaa 
PID_WNT_CANONICAL_PATHWAY 1.229 0.207 | -1.085 0.363 
NOTCH TARGETS_PCR_ARRAY 1.164 0.285 | -0.791 0.792 
PID_NOTCH_PATHWAY 1.112 0.289 | 1.086 0.277 untreated ADR untreated ADR 
REACTOME SIGNALING BY WNT_MOUSE 1.084 0.341 | 0.870 0.798 
PID_WNT_SIGNALING_PATHWAY 1.075 0.389 | -1.007 0.473 
REACTOME_SIGNALING_BY_NOTCH 0.994 0.464 | 0.767 0.964 é control:Bel2 Suv39h1-:Bel2 
NOTCH SIGNALING_PCR_ARRAY 0.942 0.578 | 0.919 0.660 ime Sac fa 5 6 = # = & 
IGO SMOOTHENED SIGNALING PATHWAY 0.938 0.589 | -1.150 0.202 Relive 
HEDGEHOG_PCR_ARRAY 0.838 0.771 | -1.343 0.052 
IGO NONCANONICAL WNT SIGNALING 0.807 0.786 | -1.316 0.099 Total 
PID_WNT_NONCANONICAL_PATHWAY 0.738 0.854 | -1.536 0.034 B-catenin 
REACTOMESIGNALINGBY HEDGEHOG_MOUSE | 0.693 0.941 | -1.110 0.314 
-Tubulin aaa nweeaeee.s=-e"_"-—_ 
PID_HEDGEHOG_2PATHWAY -0.977 0.497 | -1.339 0.112 & 
oi x Po ee 
8104 
Oo 
2 
5 54 
=) 
am | 
eg 
ADR 
RCK8 Eheb K562 Mec1 SW480 LS174T DLD-1 Caco-2 WM266.4 SkMel28 MeWo Omm2.3 
Haematological malignancies Colorectal cancers Melanomas 


Extended Data Figure 5 | Wnt signalling is upregulated in senescence. 
a, GSEA of gene sets probing stem-cell-relevant signalling pathways in 
ADR-senescent control;Bcl2 or TIS-incompetent Suv39h1~;Bcl2 cells 

(as in Fig. 1a). Positive NES indicate enrichment in TIS lymphomas. NES 
of P< 0.05 are considered statistically significant and are presented in 
red. n= 12 pairs of independent lymphomas. b, GSEA enrichment plots 
of selected gene sets presented in a; GO term ‘Canonical Wnt receptor 
signaling’ (top) or subset of proliferation-relevant Wnt target genes 
(bottom), showing significant enrichment in ADR-senescent control;Bcl2 
but not in TIS-incompetent Suv39h1~;Bcl2 cells. c, Immunoblot analysis 
of Ser37- and Thr41-dephosphorylated (that is, stabilized and nucleus 
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translocation-capable ‘Active B-catenim) and total 6-catenin in three 
independent pairs of control;Bcl2 and Suv39h1~;Bcl2 lymphoma cells, 
exposed to ADR for 5 days (+) or left untreated (—). a-Tubulin is used as 
a loading control. One out of two independent experiments shown. For 
gel source data, see Supplementary Fig. 1. d, Wnt activity measured by 
the TOPflash TCF reporter system (with FOPflash as negative control) 

in human cell lines in correlation with their senescence inducibility by 
ADR, as indicated by blue box symbols for senescence-competent cell lines 
(referring to Extended Data Fig. 1c). Results reflect mean relative light 
units fold change (between untreated and ADR-treated samples) of three 
independent experiments + s.d. 
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Extended Data Figure 6 | Cell-intrinsic activation of Wnt signalling 
cascade in TIS. a, b, Expression of indicated stem-cell-related 

transcripts by qPCR (a) and colony formation (b) in control;Bcl2 
lymphomas infected with a constitutively active Ctnnb1 mutant 
(ANCtnnb1) or a mock retrovirus. Data represent mean expression fold 
change normalized to mock-infected cells and mean colony numbers, 
respectively + s.d. (n = 3 biologically independent samples). Two-tailed, 
unpaired t-test with Welch's correction. *P < 0.05. ¢, Immunoblot analysis 
of Ser9-phosphorylated (that is, inactivated) or total GSK3§, active 

or total 8-catenin (as in Extended Data Fig. 5c), Thr202- and Tyr204- 
phosphorylated or total Erk1/2, and Ser473-phosphorylated or total Akt in 
control;Bcl2 lymphoma cells treated with ADR for five days, together with 
pharmacological inhibitors targeting MAPK and PI3K kinase pathways. 
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a-Tubulin was used as a loading control. One out of two independent 
experiments shown. For gel source data, see Supplementary Fig. 1. 

d, Expression of the indicated stem-cell-related transcripts by qPCR in 
never senescent and previously senescent Suv39h1~;Bcl2;Suv39h1 -ER!2 
cells (passage 2) exposed to Wnt signalling agonists (Wnt3a, Rspo2, or 
GSK38 inhibitor) for two days. Colour scale represents mean fold change 
normalized to never senescent cells not exposed to Wnt agonists + s.d. 
(n= 3 individual lymphomas). e, Colony formation of never senescent 
and previously senescent cells (as in d), after seven days in methylcellulose 
medium supplemented with the indicated Wnt agonists (mean colony 
numbers + s.d., 1 = 3 individual lymphomas). Two-tailed, unpaired t-test 
with Welch's correction. *P < 0.05. 
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Extended Data Figure 7 | Wnt signalling is dispensable for senescence 
induction, but required for senescence-associated stemness. 
a, Senescence induction by ADR in control;Bcl2 lymphoma cells with and 
without parallel application of the indicated pharmacological or genetic 
Wnt inhibitors (ICG-001, salinomycin or Ctnnb1 knockdown by shRNA 
(shCtnnb1)). Results reflect mean percentages of SA-6-gal-positive 
cells + s.d. (n= 4 independent lymphomas). b, Expression of stemness- 
related transcripts by qPCR in ADR-treated control;Bcl2 lymphoma cells 
exposed to Ctnnb1 knockdown by shRNA retroviral infection 
(shCtnnb1). The colour scale represents mean fold induction 
normalized to ADR-untreated (ut) and vector-infected controls +s.d. 
(n=3 biologically independent samples). c, Relative viability of 
Suv39h1~;Bcl2;Suv39h1-ER™ cells exposed to the indicated Wnt inhibitors 
either simultaneously with ADR + 4-OHT treatment (for the last 48h 
of treatment), or at passage 2 after terminating ADR+ 4-OHT (never 
senescent and previously senescent; treated over 48 h with inhibitors). 
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Results show relative viability normalized to sample with no Wnt inhibitor 
treatment + s.d. (n = 3 biologically independent samples). d, Colony 
formation of human LT174T colon carcinoma cells exposed to mock or 
shp53-lentivirus upon ADR-induced senescence, and further propagated 
in ADR-free medium (corresponding to passage 3 in Extended Data 

Fig. 3f). Results show mean colony counts after seven-day exposure 

to indicated Wnt inhibitors + s.d. (n =3 independent experiments per 
group). Two-tailed, unpaired t-test with Welch’s correction, *P < 0.05. 

e, Individual survival times of the six matched never senescent and 
previously senescent lymphoma pairs (shown collectively in Fig. 3d). 

f, Individual survival times of mice bearing never senescent (left) and 
previously senescent lymphomas (right) after exposure to Wnt signalling 
inhibition by Ctnnb1 knockdown (shCtnnb1) or left uninhibited. 

The line plots represent the same matched never senescent and previously 
senescent lymphomas as in e. 
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Extended Data Figure 8 | The previously senescent cell population 
maintains a stable fraction of Wnt-active stem cells over time. 

a, Detection of a slowly dividing subpopulation in previously senescent 
but not in never senescent lymphoma cells (arrow) by the CFSE membrane 
dye 1, 4 or 8 days after stopping the ADR + 4-OHT treatment. Experiment 
performed in triplicates. b, CFSE" previously senescent cells exhibited 
more profound nuclear }-catenin expression, indicating acquired stemness 
(passage 3 after 4-OHT/ADR removal; compare with c). One out of three 
independent experiments, each performed in triplicate. c, Co-staining 
with 6-catenin and CESE as in b in Suv39h1~;Bcl2;Suv39h1-ER™ cells, 
untreated or exposed to ADR + 4-OHT for five days (‘treatment’) and 
subsequently passaged in 4-OHT/ADR-free medium (p1-2; each passage 
reflects seven days in culture). The slowly cycling (CFSE™#") population 
was positive for 3-catenin and persisted over time, although their relative 
percentage drops owing to outgrowth of their (CFSE) progeny. Numbers 
reflect mean percentages from three independent lymphomas = s.d. 

d, e, Higher expression of ATSC- or Wnt-related (asterisks) transcripts 

by qPCR (d) and higher clonogenic capacity, which can be neutralized by 
indicated pharmacological or genetic Wnt inhibitors (e) in flow-sorted, 
8-catenin high versus (}-catenin low previously senescent cells (passage 

3 after 4-OHT/ADR removal). Mean expression levels normalized to 
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treated 


ut p1 p2 p3 


untreated cells and mean colony numbers respectively +s.d..n=4 
biologically independent samples. Two-tailed, unpaired t-test with Welch’s 
correction, *P < 0.05. f, Immunoblot analysis of 3-catenin and H3K9me3 
levels in human RCK8 lymphoma cells exposed to ADR for 5 days to 
induce senescence (‘treatment’), then stably transduced with an shp53- or 
mock lentivirus, and further propagated in ADR-free medium (‘post- 
treatment, p1-5, each reflecting seven days in culture). The senescence- 
associated high levels of active and total 8-catenin achieve a low but 

stable level at later passages. It is noteworthy that stably senescent ADR- 
pretreated, mock-infected cells were only blotted in p1. One representative 
out of three independent experiments shown, with a-tubulin as a loading 
control. For gel source data, see Supplementary Fig. 1. g, Co-expression of 
8-catenin and the stem-cell marker CD34 detected by flow cytometry in 
ADR-pretreated, shp53-infected RCK8 cells as in f, demonstrating a small 
but stable steady-state fraction of double-positive cells at later passages, 
explaining the lastingly enhanced colony-forming potential of previously 
senescent versus never senescent cells. Representative flow cytometry 
plots from three independent experiments (top) and mean percentages 

of double-positive cells + s.d. (bottom) at the indicated passages (n = 3 
independent experiments). Two-tailed, unpaired t-test with Welch’s 
correction. *P < 0.05. 
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Extended Data Figure 9 | Spontaneous escape out of senescence detected 
in cancer cells without genetic manipulations of senescence-relevant 
genes. Flow cytometric analysis of the proliferation marker EdU anda 
fluorescent SA-8-gal marker in control;Bcl2 cells treated with ADR or left 
untreated (top), and further cultivated in ADR-free medium (bottom). 
Co-expression of EdU in a small population of still SA--gal-positive 

cells demonstrates the ability of some ADR-senescent cells to escape 

the senescence arrest. Numbers represent mean percentages + s.d. from 
four independent lymphomas. Photomicrographs depict representative 
cells from populations marked with red circles (n = 4 independent 


experiments). 
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Extended Data Figure 10 | Senescence-associated de novo generation 
of leukaemia stem cells upon depletion of the stem-cell-containing 
fraction in mouse and human leukaemia samples. a, Flow cytometry 
plots of mouse Kras¢!??;DOX-on-shp53-GFP-induced T-cell acute 
lymphoblastic leukaemias (total splenocytes after short-term culture and 
retroviral Bcl2 infection), stained with a panel of mouse lineage antibodies 
before and after flow-based sorting of the LintGFP* population. The 
Lin- GFP? population (including Kit*Scal* leukaemia stem cells) was 
used as a positive control. Shown are representative plots (n= 3). 

b, Colony formation of mouse Lin*GFP* leukaemia cells as in 

a, pretreated with ADR + doxycycline (DOX) for five days and 
subsequently seeded in ADR-free/DOX-supplemented medium, thus 
producing never senescent and previously senescent cells, respectively. 
Results represent mean colony counts at passage 2 (each passage reflecting 
10 days in culture) + s.d. (n = 3 biologically independent samples). 
Two-tailed, unpaired t-test with Welch's correction. *P < 0.05. c, Nuclear 
68-catenin expression by immunofluorescence (in red) in equally five- 
day-ADR-exposed senescent versus non-senescent settings (that is, 
DOX~ versus DOX*). DAPI was used as a nuclear counterstain (in blue). 
Numbers represent mean percentages of 3-catenin-positive cells + s.d. 
(n=3 biologically independent samples). d, Colony formation of never 
senescent and previously senescent leukaemia cells pretreated as in b 
(passage 3) with the addition of the indicated pharmacological Wnt 
inhibitors (mean colony numbers + s.d., n = 3 biologically independent 
samples per group). *P < 0.05, two-tailed, unpaired t-test with Welch’s 
correction. e, Senescence induction by SA-}-gal staining in mouse 
Nras©!??;MLL-AF9;DOX-on-shp53;Bcl2 bulk AML cells (Lin- Kit*Scal*- 
depleted) after five days of the ADR + DOX treatment. Numbers reflect 
mean percentages of SA-6-gal-positive cells + s.d (experiment performed 
in triplicate). Notably, viability determined as the percentage of annexin 
V/PI double-negative cells was typically greater than 80% and comparable 
between treatment groups. f, Stemness-related transcripts by qPCR 

in conditionally senescent mouse AML cells as in e. Graphs represent 


efflux substrate 


mean fold induction + s.d. (n =3 independent experiments). g, Colony 
formation of mouse bulk leukaemia cells pretreated as in e, further 
propagated in ADR-free DOX-containing medium for 14 days, and plated 
in methylcellulose medium supplemented with the Wnt inhibitors ICG- 
001 or salinomycin. Colonies were counted after seven days. Previously 
senescent AML cells, emerging via DOX-mediated p53 knockdown, 
presented with the highest, Wnt-dependent clonogenicity, which could 

be attenuated by pharmacological Wnt inhibition. Results represent mean 
colonies + s.d. (n = 3 independent experiments). Two-tailed, unpaired 
t-test with Welch's correction. *P < 0.05. h, Colony formation of the 
CD34* cell-depleted human AML cell line Molm13 (with constitutive 
retroviral Bcl2-expression) exposed to senescence-inducing ADR 
treatment for five days (‘treatment’) and subsequently transduced 

with the lentiviral shp53 or mock construct (p53-knockdown enabling 
outgrowth from fully established senescence). Results reflect mean colony 
numbers + s.d. (n = 3 independent experiments). Two-tailed, unpaired 
t-test with Welch's correction. *P < 0.05. i, Flow cytometric detection of 
the CD33 myeloid differentiation marker and CD34 stem-cell marker 
surface expression in samples from patients with AML obtained at 
diagnosis, before any cell cultivation and after six days of cultivation 

in vitro. Representative plots are shown (n= 5 individual patient samples). 
j, Expression of stemness-related transcripts in five-day- ADR-senescent 
versus untreated, ex vivo CD34*-depleted primary human AML cells as 
in i (qPCR; average fold induction + s.d., n =5 individual patient samples, 
left). Photomicrographs (right) confirm ADR-inducible senescence by 
SA-6-gal staining (mean percentages of SA-6-gal positive cells +s.d., 
representative photomicrographs from five independent samples). 

k, Regained CD34 surface expression upon ADR-induced senescence in 
CD34+-depleted primary human AML cells as presented in j. Numbers 
reflect mean fluorescence intensity detected by flow cytometry + s.d. 

(n=5 individual patient samples). Two-tailed, paired t-test, *P < 0.05. 

1, ABC transporter activity in ADR-senescent versus untreated cells as in k. 
Representative plots are shown (n =5 individual samples). 
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Therapeutic targeting of ependymoma as informed 
by oncogenic enhancer profiling 
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Genomic sequencing has driven precision-based oncology therapy; 
however, the genetic drivers of many malignancies remain unknown 
or non-targetable, so alternative approaches to the identification of 
therapeutic leads are necessary. Ependymomas are chemotherapy- 
resistant brain tumours, which, despite genomic sequencing, 
lack effective molecular targets. Intracranial ependymomas are 
segregated on the basis of anatomical location (supratentorial region 
or posterior fossa) and further divided into distinct molecular 
subgroups that reflect differences in the age of onset, gender 
predominance and response to therapy’ *. The most common and 
aggressive subgroup, posterior fossa ependymoma group A (PF- 
EPN-A), occurs in young children and appears to lack recurrent 
somatic mutations”. Conversely, posterior fossa ependymoma 
group B (PF-EPN-B) tumours display frequent large-scale copy 
number gains and losses but have favourable clinical outcomes)”. 
More than 70% of supratentorial ependymomas are defined by 
highly recurrent gene fusions in the NF-«B subunit gene RELA 
(ST-EPN-RELA), and a smaller number involve fusion of the gene 
encoding the transcriptional activator YAP1 (ST-EPN-YAP1)!**. 


Subependymomas, a distinct histologic variant, can also be found 
within the supratetorial and posterior fossa compartments, and 
account for the majority of tumours in the molecular subgroups 
ST-EPN-SE and PF-EPN-SE. Here we describe mapping of active 
chromatin landscapes in 42 primary ependymomas in two non- 
overlapping primary ependymoma cohorts, with the goal of 
identifying essential super-enhancer-associated genes on which 
tumour cells depend. Enhancer regions revealed putative oncogenes, 
molecular targets and pathways; inhibition of these targets with 
small molecule inhibitors or short hairpin RNA diminished the 
proliferation of patient-derived neurospheres and increased 
survival in mouse models of ependymomas. Through profiling 
of transcriptional enhancers, our study provides a framework for 
target and drug discovery in other cancers that lack known genetic 
drivers and are therefore difficult to treat. 

To pinpoint genes that depend on enhancers for their role in tumour 
formation, we characterized regions of actively transcribed chromatin 
in 42 primary intracranial ependymomas using histone 3 lysine 27 
acetylation chromatin immunoprecipitation and sequencing (H3K27ac 
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Figure 1 | H3K27ac profiles define active regulatory elements of 
ependymoma. a, Unsupervised hierarchical clustering of the top 

10,000 variant enhancer loci detected in ependymomas compared to 

the Roadmap Epigenomics Consortium samples; n = 143 independent 
samples. b, c, Inflection plot indicating identified ependymoma super 
enhancers. d, e, Venn diagrams depicting the number of shared enhancers 
(d) and super enhancers (e) between the Heidelberg (n = 24) and Toronto 


ChIP-seq), a histone mark of active chromatin, on two independent 
cohorts of fresh-frozen primary ependymoma specimens in two differ- 
ent facilities (“Heidelberg’ and “Toronto’), each with a different H3K27 
acetylation-specific antibody. Our analysis focused on the intersection 
of shared enhancers between these two datasets, integrated with whole- 
exome sequencing (WES), whole-genome sequencing (WGS), RNA 
sequencing (RNA-seq), DNA copy-number analysis, and DNA methy- 
lation profiling (Extended Data Figs 1, 2; Supplementary Tables 1-7). 
‘Active’ typical enhancers were defined as significant H3K27ac 
peaks more than 2.5 kb from the nearest transcriptional start site. To 
perform unsupervised hierarchical clustering, the top 10,000 variant 
enhancer loci from both cohorts were compared to the Roadmap 
Epigenomics and ENCODE databases’ (Fig. 1a, Extended Data Figs 3, 4). 
Ependymoma enhancer profiles were distinct from those of other 
tissue types, marked by acquisition and loss of hundreds of enhancer 
loci (Extended Data Fig. 4). Consistent with prior literature, super 
enhancer domains were substantially associated with greater transcrip- 
tional load®? (Extended Data Fig. 4). We identified 2,196 and 3,176 
super enhancers in the Heidelberg and Toronto cohorts, respectively, 
and both cohorts shared a large proportion of super enhancer regions 
(Fig. lb-e, Supplementary Tables 8-10, Extended Data Fig. 4). The 
vast majority of super enhancers were tumour-specific and enriched 
with cancer-associated genes reported in other solid cancers, includ- 
ing PAX6, SKI, FGFRL1, FGFR1, and BOC (Fig. 1b, c, Supplementary 
Table 10, Extended Data Fig. 4). Several of these genes, such as 
EPHB2 and CCND1, have been previously validated as ependymoma 
oncogenes!°-!? (Extended Data Fig. 5). 
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(n= 18) independent ependymoma sample cohorts. f, Quantitative reverse 
transcription PCR showing knockdown efficiency of 15 ependymoma super- 
enhancer-associated genes (n = 3 technical replicates, error bars show s.d. 
Results were reproduced in independent biological duplicates). g, Percentage 
of top ependymoma super enhancer genes that demonstrate greater than 50% 
decrease in viability over seven days. Cell survival from knockdown of each 
gene was assayed and independently replicated as biological triplicates. 


To determine whether super enhancers reveal pathways and genes 
on which ependymoma cells depend, and which could be actionable 
by targeted therapy, the 15 top-ranking ependymoma super enhancer 
genes were validated in a series of 60 RNA interference short hairpin 
RNA (shRNA) knockdown time-course studies to demonstrate the 
feasibility of our approach to uncover novel cancer targets (Extended 
Data Fig. 6). Following transduction of ST-EPN-RELA patient- 
derived (EP1-NS) cells with shRNA constructs, the two most effective 
and specific shRNA constructs per gene were functionally validated 
(Fig. 1f). Globally, depletion of the top-ranking tumour-specific super 
enhancer genes impaired cell growth to varying degrees over seven 
days, compared to non-targeting shRNA controls (Extended Data 
Fig. 7). Using a stringent cut-off of shRNA-mediated growth inhibition 
by two independent shRNA constructs (shRNA.1 and shRNA.2) of at 
least 50% decrease in cell viability over seven days, a majority (60%) of 
ependymoma super enhancer genes were required for cellular main- 
tenance, supporting super enhancer mapping as a viable approach for 
therapeutic target identification (Fig. 1g). 

We next investigated whether the differences in enhancer land- 
scapes between molecular subgroups of ependymoma reflect tran- 
scriptional differences. In both cohorts, unsupervised hierarchical 
clustering of all enhancers demonstrated an unbiased segregation of 
ependymoma molecular subgroups (Fig. 2a-d, Extended Data Fig. 5). 
Molecular differences between ependymoma subgroups were sup- 
ported by robust segregation at the DNA methylation level (Fig. 2c). 
Subgroup-specific typical enhancers were enriched within large 
H3K27 acetylated domains (that is, super enhancers), and confirmed 
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Figure 2 | Active enhancers delineate subgroups of ependymoma. 

a, b, Unsupervised hierarchical clustering of all H3K27ac enhancer loci in 
Heidelberg (n = 24) and Toronto (n = 18) independent sample cohorts. 

c, Combined f-distributed stochastic neighbour embedding (t-SNE) 
analysis of the top 10,000 variably methylated Illumina 450K CpG probes. 
d, Combined t-SNE analysis of all enhancer loci. n = 43 independent 
samples. e, f, t-SNE analysis of the H3K27ac marked super enhancer 


by unsupervised segregation of ependymoma subgroups using super 
enhancer regions (Fig. 2e, f, Extended Data Fig. 5). We termed this 
distinct class of super enhancers with subgroup-specific enhancer activ- 
ity SE-SSEAs, and similarly typical enhancers with subgroup-specific 
activity TE-SSEAs. Over 86% of SE-SSEAs observed in the Heidelberg 
cohort were confirmed by the Toronto cohort as active super enhancers 
in the respective subgroup (Extended Data Fig. 5), thus uncovering 
a distinct subset of super enhancers that were most common in the 
PF-EPN-A, PF-EPN-B and ST-EPN-RELA subgroups of ependymoma 
(Fig. 2g-1, Extended Data Fig. 5, Supplementary Tables 11-16). Owing 
to the low prevalence of ST-EPN-YAP1, ST-EPN-SE, and PF-EPN-SE 
tumours, these tumours were not represented in the Toronto cohort, 
and further downstream analysis was based on the Heidelberg cohort 
alone (Fig. 2g-l, Extended Data Fig. 5, Supplementary Tables 11-16). 
SE-SSEA genes were associated with subgroup-specific gene expression, 
further supporting the role of super enhancers as important contri- 
butors to transcriptional output (Extended Data Fig. 5, Supplementary 
Tables 11-16). SE-SSEA genes also converged on a subset of signal- 
ling pathways that distinguished the molecular subgroups of ependy- 
moma, such as the polycomb repressive complex 1 (PRC1) and histone 
deacetylase (HDAC4) pathways in ST-EPN-RELA tumours, both of 
which can be inhibited by small molecules (Fig. 2m, Extended Data 
Fig. 5, Supplementary Table 17). 

To translate identified SE-SSEA genes in subgroups of ependy- 
moma into novel therapeutic leads, we first focused on ST-EPN-RELA 
tumours, where we observed an SE-SSEA proximal to CACNA1H and 
associated with its subgroup-restricted gene expression (Extended 
Data Fig. 8). CRISPR-dCas9-KRAB mediated repression of active 
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regions in ependymoma. n= 42 independent samples. g, 1, Inflection 
plot indicating super enhancers with subgroup-specific enhancer activity 
(SE-SSEA) in ependymomas. n = 24 independent samples. m, G-Profiler 
pathway analysis of ependymoma subgroup super-enhancer-associated 
genes with significant enrichment indicated as the false discovery rate 
(FDR)-corrected P value. n = 24 independent samples. 


constituent enhancers within the CACNA1H super enhancer resulted 
in downregulation of CACNA1H gene expression (Extended Data 
Fig. 8). Compared to a PF-EPN-A primary culture (S15-NS), cell pro- 
liferation of an ST-EPN-RELA patient-derived primary culture model 
(EP1-NS) was specifically impaired by shRNA-mediated knockdown of 
CACNA1H or pharmacologic blockade of its activity using the calcium 
channel inhibitor mibefradil (Extended Data Fig. 8). In a similar fash- 
ion, we found the super-enhancer-regulated gene IGF2BP1 preferen- 
tially in a subset of PF-EPN-A tumours. shRNA-mediated targeting of 
IGF2BP1 in PF-EPN-A ependymoma cultures, but not ST-EPN-RELA 
primary cultures, impaired cell proliferation, implicating IGF2BP1 
as a potential cancer dependency gene in PF-EPN-A ependymomas 
(Extended Data Fig. 8). Our findings thus identify candidate oncogenes 
that are associated with super enhancers as well as novel pathways spe- 
cific to subgroups of ependymoma. 

The regulation of cell-type-specific gene expression is often dom- 
inated by only a small number of core transcription factors out of 
the hundreds expressed within a given cell type'’. As many impor- 
tant transcription factor motifs, such as FOSL1, FOSL2, SOX9, RFX2, 
and SOX2, were enriched across shared enhancers of ependymoma 
(Fig. 3a, Supplementary Table 18), we sought to identify the principal 
transcription factors of ependymoma that govern ependymoma cell 
identity across subgroups using core regulatory circuitry analysis®'4 
(Fig. 3b, Extended Data Fig. 9, Supplementary Table 19). A small set 
of highly active transcription factors was identified, including SOX9, 
RFX2, SOX2, ZBTB16, HES1, NFIA, and NFIB, which were highly 
expressed in ependymoma compared to a large collection of normal 
brain tissues (Fig. 3b, Extended Data Fig. 9). By contrast, transcription 
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Figure 3 | Transcription factor circuitries of ependymoma. a, DNA 
motifs enriched within shared ependymoma-typical enhancers that overlay 
with ATAC-seq peaks derived from the EP1-NS cell culture model as 
determined by HOMER motif analysis (see Methods and Supplementary 
Table 18). TE, transcription factor. b, Heatmap of transcription factors 
ranked by predicted activity using core circuitry analysis (left) and 
presence or absence of self-loop activity (right). n = 18 independent 
samples from Toronto cohort. cf, shRNA constructs targeting super- 
enhancer-associated genes ordered by normalized cell survival. 
Highlighted in red are shRNAs targeting super-enhancer-associated core 
transcription factors. Each gene assayed with six technical replicates and 
replicated in three independent biological experiments. g—1, Connections 
between subgroup-specific transcription factors integrated with gene 
expression in subgroups of ependymoma. n= 24 independent samples. 
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factors that exhibited lower relative activity showed no significant 
difference in gene expression compared to normal brain (Extended 
Data Fig. 9). RNA interference (RNAi) was used to functionally 
demonstrate that the ependymoma core transcription factors SOX9, 
RFX2, SOX2 and ZBTB16 were essential for ependymoma cell main- 
tenance (Fig. 3c-f, Extended Data Fig. 7). We hypothesized that this 
core model would be further specified by additional transcription fac- 
tors that delineate the transcriptional differences between molecular 
subgroups of ependymoma. An integrative analysis was performed 
to assess subgroup-specific enhancers, the expression of their target 
genes within local topological associated domains!®, and the enrich- 
ment of subgroup-specific transcription factor-binding motifs at these 
subgroup-specific enhancer loci. Using this approach, we modelled 
regulatory circuitry maps of each molecular subgroup of ependymoma, 
as defined by distinct sets of transcription factors, which might be used 
to establish and/or maintain ependymoma subgroup identity (Fig. 3g-l, 
Supplementary Table 20). 
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Figure 4 | Active regulatory maps identify candidate drugs against 
ependymoma. a, Pie chart of candidate drug compounds detected by 
integrating shared super enhancers with the Washington University Drug 
Gene Interaction Database. b-d, Ependymoma cells and neural stem cell 
line 1 (NSC1) controls treated with JQ1 (b), AZD1775 (c) or AZD4547 
(d) for 72 h and assessed using an Alamar Blue stain. Error bars show 
s.d. Experiment performed as six technical replicates and replicated 

in biological triplicates. e, Kaplan-Meier curve for immunodeficient 
mice bearing H.612 ependymomas, treated with vehicle or AZD4547 
(25mg kg”! d~'). Significance of endpoint difference was assessed using 
a log-rank test. Median survival ratio of treatment (AZD4547):control 
(vehicle) is 44 days:33 days, and reported as a ratio of 1.333 with a 95% 
confidence interval of 0.4677-3.801. 


We leveraged subgroup-specific super-enhancer-regulated tran- 
scription factors to provide further insight into the lineage programs 
of ependymoma (Extended Data Fig. 10). The rationale for these 
experiments stemmed from our observation that in zebrafish embryos, 
several subgroup-specific super enhancers were active in specific 
regions within the developing central nervous system (Extended 
Data Fig. 9). We identified a FOXJ1 transcription factor network 
that was enriched in PF-EPN-B ependymoma (Extended Data 
Fig. 10). FOXJ1 is expressed during mouse embryonic develop- 
ment at E13.5 (during the expansion of radial glial cells (RGCs), 
which are candidate cells-of-origin of ependymoma) and its expres- 
sion is restricted in the regions surrounding the choroid plexus 
in the mouse forebrain and hindbrain (Extended Data Fig. 10). 
Compared to other brain tumour types, FOXJ1 expression was 
increased in ependymomas, with the highest levels in PF-EPN-B 
tumours'® (Extended Data Fig. 10). Furthermore, the ependymal 
differentiation program in RGC-derived FOX]1-expressing cells versus 
FOXJ1-knockout cells was significantly and specifically enriched in 
PF-EPN-B ependymomas (Extended Data Fig. 10). From these data, we 
hypothesized that the transcriptional program of PF-EPN-B tumours 
closely resembles a more differentiated cell type along the ependymal 
lineage compared to ependymomas previously shown to match more 
primitive RGC precursor populations’ 


© 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. 


To inform the clinical translation of ependymoma dependencies, 
we prioritized targets for which small molecules were available by 
integrating our analysis of tumour-specific super-enhancer-regulated 
genes with the Washington University Drug Gene interaction 
database!” (Fig. 4a, Supplementary Table 21). HDAC7, EPHA2, FGFR1 
and CACNA1H were identified as candidate genes on which ependy- 
momas depend that could be responsive to small-molecule inhibitors 
(Fig. 4a). Numerous subtype-restricted lead compounds were also 
identified (Supplementary Table 22). Active super enhancers marking 
molecular dependencies for ependymomas suggested that ependy- 
moma cells would be responsive to inhibition of the BET bromodomain 
family of proteins by JQ1, which blocks protein ‘readers’ of H3K27 
acetylation. JQ1 inhibited the proliferation of ependymoma cells at 
clinically achievable nanomolar concentrations and showed limited 
efficacy against normal brain cell proliferation (Fig. 4b). Our super 
enhancer analysis identified FGFR1 small-molecule inhibitors as pos- 
sible pan-ependymoma therapies, whereas inhibitors of another super- 
enhancer-associated gene product, WEE], are likely to be active for 
subsets of ependymoma. AZD4547 (FGFRI inhibitor) and AZD1775 
(WEE1 inhibitor) exhibited potent and clinically achievable anti- 
tumour activity (Fig. 4c, d). Treatment of immunodeficient mice bearing 
posterior fossa ependymoma intracranial xenografts (H.612) with 
AZD4547 extended survival (Fig. 4e), suggesting that chromatin land- 
scapes can inform therapeutic paradigms. 

Our study of active chromatin landscapes within ependymomas 
identified tumour- and subgroup-specific super-enhancer-driven genes 
in ependymoma as potential leads for further testing. By integrating our 
data with drug interaction databases, we identified and validated novel 
cancer dependencies of ependymoma that are responsive to pharma- 
cologic inhibition. Our study further demonstrates that knowledge of 
enhancer landscapes can be used to dissect the molecular differences 
between histologically similar tumour entities and to provide unique 
information that may inform precision therapies. These differences 
are captured by the characterization of variant enhancer and super 
enhancer loci, in addition to the reverse engineering of core transcrip- 
tional regulatory circuitries in tumours. Finally, as shown in ependy- 
momas and other tumours, knowledge of core and subgroup-specific 
transcription factors reveals a molecular basis for the oncogenic 
transcriptional programs of cancer, and provides insight into lineage 
programs that persist in the neoplastic state’. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 


Patients and tumour samples. Tumour samples, clinical information, and animal 
studies were approved by local ethics institutional review boards (IRBs) from both 
the Heidelberg and Toronto institutions. Informed consent was obtained from all 
patients. No subject underwent chemotherapy or radiotherapy before the surgical 
removal of the primary tumour. In the sequencing cohort of tumour samples, at 
least 80% of tumour cell content was estimated by staining cryosections (~5 jm 
thick) of each sample with haematoxylin and eosin as described previously’. 
Diagnoses were confirmed by histopathologic assessment by at least two neuro- 
pathologists, including a central pathology review that used the 2007 World Health 
Organization classification for Central Nervous System tumours. 

WES and WGS DNA library preparation and Illumina sequencing. Tumour 
and control samples were individually processed; in every case, thorough his- 
tological examination proved that each tumour consisted of over 80% tumour 
cells (in most cases >95%). DNA from tumour and control samples (blood) was 
prepared and sequenced individually. The Agilent SureSelect Human All Exon 
50-Mb target enrichment kit (v3 initially, switched to v4 subsequently) was used 
to capture all human exons for deep sequencing, using the vendor’s protocol v2.0.1. 
The SureSelect Human All Exon Kit targets regions of 50 Mb in total size, which is 
approximately 1.7% of the human genome. In brief, 3 j1g genomic DNA was sheared 
with a Covaris $2 to a mean size of 150 bp. Five hundred nanograms of library 
DNA was hybridized for 24h at 65°C with the SureSelect baits. The captured frag- 
ments from the tumour samples and controls were sequenced in 105-bp single-end 
mode on an Illumina HiSeq2000 deep sequencing instrument (based on Illumina, 
Inc., v3 sequencing chemistry). The median coverage of whole-exome sequenced 
tumour samples was 157-fold (range 43-469-fold) and for control samples (blood 
DNA) 146-fold (range 80-222-fold). In addition, whole-genome libraries (before 
the exome hybridization step) were sequenced (three lanes each in paired-end 
105-bp mode) on the HiSeq2000, as described". 

To increase the coverage of the samples for whole-exome sequencing, we 
used the following strategy. Exome capture was initially carried out with Agilent 
SureSelect (Human All Exon 50 Mb) in-solution reagents using the default Illumina 
adapters (without barcode). To introduce Illumina Multiplex barcodes into the 
existing libraries at a later stage, 15 ng final exome-enriched library (without 
barcode) was used as a template in a 50-11 PCR reaction. The Herculase II Fusion 
enzyme (Agilent) was used together with the NEBNext Universal PCR primer 
for Illumina and NEBNext Index primer (NEB #E7335S) under the following 
conditions. The initial denaturation step for 2 min at 98°C was followed by four 
cycles of 30 s 98°C, 30 s 57°C, 1 min 72°C, and a final step of 10 min at 72°C. Six 
or seven barcoded samples were then sequenced on the Hiseq2000 in 2 x 100-bp 
paired-end mode. 

WGS and WES data processing. Fastq files were processed by the standardized 
alignment and variant-calling pipeline developed and applied in the context of 
the Pan-cancer Analysis of Whole Genomes (PCAWG) project (https://github. 
com/ICGC-TCGA-PanCancer)!®. Here, we used the human genome assembly 
hs37d5 (https://ncbi.nlm.nih.gov/assembly/2758) as a reference genome and 
GENCODE19 (http://gencodegenes.org/releases/19.html) as gene annotations. 
Germline or somatic origin of the variants and indels was determined on the basis 
of their presence or absence in the matched control tissue. 

RNA-seq data processing. Sequencing reads were aligned to the GRCh37 1000G 
reference using STAR 2.3.0’ by reporting only reads with one best alignment 
(-outFilterMultimapNmax 1). Uniquely aligned reads were counted at gene 
regions using the package Subread v1.4.6 based on Gencode v19 annotations. 
Differential gene expression analysis between subgroups was performed using 
the R/Bioconductor package DESeq2 with contrast adjustment for multiple groups 
comparison. Fusion gene discovery was performed by the InFusion toolkit v.0.6.3”°. 
Chromatin immunoprecipitation. ChIP of 5-10 mg flash-frozen primary ependy- 
moma tumour was performed using 5 mg H3K27ac antibody per ChIP experiment 
(Abcam-AB4729 (Toronto) or Active Motif-39133 (Heidelberg)). Enriched DNA 
was quantified using Picogreen (Invitrogen) and ChIP libraries were amplified and 
barcoded using the Thruplex DNA-seq library preparation kit (Rubicon Genomics) 
according to the manufacturer’s recommendations. Following library amplifica- 
tion, DNA fragments were agarose gel (1.0%) size-selected (<1 kb), assessed using 
Bioanalyzer (Agilent Technologies) and sequenced at The Centre for Applied 
Genomics (The Hospital for Sick Children) using Illumina Hi-Seq 2000 100-bp 
(Toronto cohort) and 50-bp (Heidelberg) single-end sequencing 

ChIP-seq data pre-processing, enhancer and super enhancer analysis. Mapping 
of ChIP-seq data was performed as described”'. Analogous to ref. 8, H3K27ac peak 
finding was performed using MACS1.4 with default parameter settings except 
with a P-value threshold of 1 x 10~°. Peak finding for each ependymoma was 
performed separately, and as a control background for each H3K27ac ChIP-seq 
sample, its matched genomic DNA was used where available. Peaks that could not 


be identified in at least two primary ependymomas and peaks contained completely 
within the region surrounding +2.5 kb of transcriptional start sites were excluded 
from any further analysis. Afterwards, the H3K27ac peaks of the individual 
samples were merged into a single set of (non-overlapping) peaks. When comparing 
against the Roadmap Epigenomics Dataset, reads from ependymoma samples were 
trimmed to 36 bp to be consistent with processed Roadmap Epigenomics Data, and 
then pre-processed as described above. To reduce potential batch effects, enhancer 
H3K27 acetylation profiles were quantile-normalized using the preprocessCore 
package in R. Super enhancers were identified using the rank ordering of super 
enhancers (ROSE) algorithm, which classified as a super enhancer any set of two 
or more H3K27ac peaks (detected by MACS1.4, P< 1~°) within a 12.5-kb distance, 
and further than 2.5 kb from a transcriptional start site. Super enhancers were 
further defined by those demonstrating the greatest levels of H3K27 acetylation as 
detected by graphing an inflection plot and selecting values for which the slope of a 
fitted curve exceeded a value of 1. In the case of tumour-specific super enhancers, 
all regions were removed that contained any overlap with a super enhancer 
detected in at least one normal brain region consisting of: anterior caudate, 
cingulate gyrus, hippocampus middle, inferior temporal lobe, mid frontal lobe, 
and substantia nigra. 

t-SNE analysis of Illumina DNA methylation and enhancer data. All DNA meth- 
ylation analyses were performed in R v3.3.0 (R Development Core Team, 2015). 
Raw signal intensities were obtained from IDAT-files using minfi Bioconductor 
v1.18.2. Each sample was individually normalized by performing a background 
correction (shifting the 5th percentile of negative control probe intensities to 0) 
and a dye-bias correction (scaling the mean of normalization control probe inten- 
sities to 10,000) for both colour channels. No further normalization or transforma- 
tion steps were performed, and standard beta-values were used for downstream 
methylation analyses. The following criteria were applied to filter out probes 
prone to yield inaccurate methylation levels: removal of probes targeting the 
X and Y chromosomes (n = 11,551), removal of probes that overlap common 
SNPs (dbSNP132 Common) within the CpG or the following base (n= 7,998), 
and removal of probes not mapping uniquely to the human reference genome 
(hg19) (n= 3,965). To enable comparability with the lumina Infinium 
HumanMethylationEPIC array, we also removed probes not represented on this 
array (n= 32,260). In total, 428,799 probes were kept for analysis. For unsuper- 
vised hierarchical clustering, we selected the 10,000 most variably methylated 
probes across the dataset (s.d. > 0.264). Distance between samples was calculated 
by using 1-Pearson correlation coefficient as the distance measure. The resulting 
distance matrix was used to perform t-SNE analysis with Rtsne package v0.11. The 
following non-default parameters were used: theta=0, is_distance=T, pca=E 
max_iter = 10000. 

For clustering of H3K27ac ChIP-seq data from the Heidelberg and Toronto 
cohorts together, we processed both cohorts in single-end mode without back- 
ground using the R/Bioconductor package QSEA v.0.0.11. For each sample, we 
quantified sequencing reads as reads per kilobase per million (RPKM) at previously 
derived enhancers, neglecting enhancers at mitochondrial and sex chromosomes. 
Distance between samples was calculated by using 1-Spearman correlation coeffi- 
cient as the distance measure. The resulting distance matrix was used to perform 
the t-SNE analysis (Rtsne package v0.11). The following non-default parameters 
were used: theta=0, is_distance=T, pca=F, max_iter = 5000. 

Unsupervised hierarchical clustering analysis of variant enhancer loci. 
A matrix of the normalized H3K27ac density was generated in HOMER (v3.12) 
based on the identified consensus typical enhancers. Variant enhancer loci (VELs) 
were defined as enhancers, which exhibited the greatest median absolute deviation 
(MAD) across all samples used for clustering. In the case of unsupervised hierar- 
chical clustering between ependymoma, Roadmap Epigenomics, and ENCODE 
samples, the top 10,000 VELs were retained. These enhancers were used for 
unsupervised hierarchical clustering using a Pearson correlation as a distance 
metric. In the case of super enhancers, a matrix was generated in HOMER using the 
consensus super enhancer BED files of normalized H3K27ac densities across 
all samples. Non-negative matrix factorization was performed using all super 
enhancer regions, using the methodology described previously, with 20 iterations, 
across 10 rank classifications”. 

Identification of super-enhancer-associated pathways and drug-gene inter- 
actions. Differential super-enhancer-associated genes in ependymomas or 
ependymoma subgroups were imported into G-Profiler” for pathway analysis, 
restricted to GO, KEGG and REACTOME gene sets. Cytoscape (v3.2.1) and the 
EnrichmentMap plug-in was used to generate networks for genesets enriched 
with an FDR cut-off of <0.05. super-enhancer-associated genes were also used 
to query the Washington University Drug Gene Interaction database, restricted 
to expert-curated drug-target interactions to identify novel and druggable gene 
targets)’. 
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Analysis of super enhancers with subgroup-specific enhancer activity (SSEA). 
To identify subgroup-specific enhancer activity, we employed the R/Bioconductor 
package QSEA v.0.0.1177. Previously calculated enhancer regions (see above) 
were provided as regions of interest and tiled into 500-bp windows. For each 
sample, H3K27ac ChIP-seq enrichments were calculated at these tiled enhancers 
and were library size-normalized by TMM. In addition, matched blood and 
tumour WGS data were imported and copy number variations were calculated 
for all ependymoma samples using the findCNV() function of the QSEA package. 
CNV-aware subgroup-specific enhancer activity was then calculated by comparing 
H3K27ac ChIP-seq enrichments in one subgroup against the other subgroups by 
fitting general linear models with respect to the presence of CNVs (non-default 
parameters are norm_method = “nrpkm’, minRowSum = 10, fdr_th=10~°, direc- 
tion = “gain”). We excluded 500-bp windows that were significant in more than one 
subgroup. For each subgroup, we stitched all significant 500-bp windows within a 
distance of 12.5 kb together, summed their normalized H3K27ac ChIP-seq enrich- 
ment values (nRPKM), and ranked them accordingly. Analogous to the definition 
of super enhancers, we define the first occurrence of a slope >1 (from high to 
low enrichment) as a threshold for distinguishing between extended stretches of 
significant SE-SSEAs and TE-SSEAs. 

Calculating core regulatory networks for super-enhancer-associated 
transcription factors. To quantify the interaction network of transcription 
factor regulation, we calculated the inward and outward binding degree of all 
super-enhancer-associated transcription factors!*. For all promoters within 
100 kb, the most acetylated promoter was assigned as the target of the super 
enhancer (excluding promoters that overlap super enhancers). If there were no 
active promoters within 100 kb, the super enhancer was assigned to the nearest 
active promoter. All super-enhancer-associated promoters annotated to regulate 
a transcription factor were considered as the node-list for network construction. 
For any given transcription factor (TFi), the IN degree was defined as the number 
of transcription factors with an enriched binding motif at the proximal super 
enhancer or promoter of TFi. The OUT degree was defined as the number of tran- 
scription factor-associated super enhancers containing an enriched binding site 
for TFi. Within any given super enhancer, enriched transcription factor binding 
sites were determined at putative nucleosome-free regions (valleys) flanked by 
high levels of H3K27ac. Valleys were calculated using an adapted algorithm”. In 
these regions, we searched for enriched transcription factor binding sites using the 
FIMO59 algorithm with transcription factor position weight matrices defined in 
the TRANSFAC database‘. An FDR cut-off of 0.01 was used to identify enriched 
transcription factor-binding sites. 

Identification of regulatory networks at enhancers with subgroup-specific 
enhancer activity. Subgroup-specific transcription factor-regulatory networks were 
constructed as previously described with only a few amendments**°, H3K27ac 
data of the samples within the same subgroup were combined. For each subgroup, 
nucleosome-free regions (NFRs) were identified using the findPeaks function 
of HOMER® (http://homer.salk.edu/homer/ngs/index.html) with option -nft. 
ENCODE transcription factor motifs and their mapped positions in the genome 
were downloaded from http://compbio.mit.edu/encode-motifs/. For each tran- 
scription factor, contingency tables containing the number of NFRs overlapping 
and non-overlapping with the respective transcription factor were constructed. 
The significance of enrichment of transcription factors in NFRs of enhancers with 
subgroup-specific activity was determined using the x? test. The resulting P values 
were corrected for multiple testing (FDR <0.01). Transcription factor enrichments 
were calculated as the ratio between observed counts over expected counts. To 
identify enhancer target genes, we accessed publicly available topology-associated 
domains (TADs) previously obtained in IMR90 cells. Each SSEA was assigned to 
its enclosing TAD and protein-coding genes within the same TAD were identi- 
fied. Correlation tests (Spearman's rank correlation coefficient) for SSEA H3K27ac 
enrichment and gene expression level within the same TAD were performed. 
After repeating this procedure for each enhancer, all P values obtained were com- 
bined and corrected for multiple testing using the Bioconductor package qvalue. 
Correlations with an FDR less than 1% were preserved. To derive subgroup- 
specific transcription factor regulatory networks, we selected the top 50% enriched 
transcription factors in each subgroup, which also have the highest expression in 
the respective subgroup compared to the other subgroups. The resulting networks 
highlight transcription factors (red or orange nodes) whose binding sites are sig- 
nificantly enriched at enhancers with SSEA. By gene-enhancer correlation analysis 
restricted by TAD domains (see above), these transcription factors were assigned 
to their likely target genes (blue nodes). Networks were visualized using by Gephi 
(http://gephi.github.io/). 

ATAC-seq chromatin preparation and sequencing. Freshly cultured epend- 
ymoma cells were prepared for ATAC-seq as described”. In brief, nuclei were 
prepared from ~ 50,000 cells by spinning at 600g for 10 min at 4°C, followed by 
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a PBS wash and centrifugation at 600g for 5 min. Cells were lysed using ice-cold 
lysis buffer (10 mM Tris-HCl, pH 7.4, 10 mM NaCl, 3mM MgCh, 0.1%), and 
centrifuged for 10 min at 600g at 4°C. The supernatant was removed and pellet 
re-suspended in 50 ul transposase mix (25112 x TD buffer, 2.5 il transposase, 22.5 ul 
water) (FC-121-1030 lumina) for 30 min at 37°C. Library amplification was 
performed using the NEBnext High Fidelity 2 PCR Master Mix (#M0541S 
New England Biolabs) according to previously published PCR conditions?”. PCR 
reactions were purified using a QIAGEN miniElute kit, and a following size selec- 
tion step using standard gel extraction protocol to isolate ~ 240-360 bp. ATAC-seq 
library preparations were sequenced using single-end 50-bp reads on the Illumina 
HiSeq 2000 platform. Raw reads were adaptor-trimmed using Trim Galore (v0.2.5) 
and aligned to the genome with Bowtie (v1.0.1) with the m1 option enabled to 
allow only uniquely aligned high-quality reads. Peaks were called using the MACS2 
software (v2.1.0.20140616) with the options —q 0.05 to retain significant peaks, 
-shiftsize 50 to account for the transposase fingerprint, and otherwise default 
parameters were used. Tag count libraries and bedgraph files were constructed 
using HOMER software (v4.7). 

Ependymoma culture experiments. Ependymoma cell cultures were isolated 
from patients and cultured on laminin (Sigma) and in neurobasal medium 
(Invitrogen) consisting of: sodium pyruvate (Invitrogen), B27 (Invitrogen), 
glutamine (Cleveland Clinic Media Core), human EGF (Invitrogen), human basic 
FGF (Invitrogen), and penicillin/streptomycin (Cleveland Clinic Media Core). 
Medium was replenished every other day while leaving ~ 50% conditioned 
medium to encourage continued cell proliferation. Cell viability assays were 
performed in 96 wells using an Alamar Blue stain (Invitrogen) according to the 
manufacturer’s instructions. Drug-response assays were performed by seeding 
cells overnight, treating the following day with increasing drug concentrations, 
and reading by Alamar Blue Absorption following 72h of treatment. AZD4547 
and MK1775 were obtained from Selleck Chemicals. JQ1 was provided by the 
laboratory of J. E. Bradner (Harvard). All cell lines were STR profiled for authen- 
ticity and confirmed to be mycoplasma free using a PCR-based detection strategy 
with positive and negative controls. 

RNA interference of enhancer-associated genes. Lentiviral shRNA clones (Sigma 
Mission RNAi) targeting super-enhancer-associated genes, and two non-targeting 
controls (SHC002, SHC007) were purchased from Sigma. (Supplementary Table 23). 
These vectors were co-transfected into HEK 293FT cells with the packaging 
vectors psPAX2 (Addgene) and pCI- VSVG (Addgene) using a calcium phosphate 
method to produce viable lentivirus. Knockdown efficiency of different lentiviral 
shRNA clones in cells was determined by quantitative reverse transcription PCR. 
Cells infected with lentivirus expressing the indicated shRNAs were plated in 
96-well plates at 1,000 cells per well. Cell viability was determined after the indi- 
cated number of days after plating using Alamar Blue Assay (Life Technologies) 
or CellTitreGlo (Promega). 

CRISPR-Cas9-mediated repression of enhancer regions. CRISPR-Cas9 
sgRNAs were identified and designed using the MIT CRISPR design tool, and 
control (pLenti-Guide-Puro D103) non-targeting sgRNAs were selected from the 
GeCKOv2? library. All sgRNA sequences may be found in Supplementary Table 23. 
sgRNAs were cloned into plenti-Guide-Puro (Addgene, 52963). Lentivirus expressing 
dCAS9-KRAB (gift from M. Meyerson laboratory)”* were used to infect EP1-NS, 
following which cells were selected for 48 h with 101g/ml blasticidin. These cells 
were then infected with selected lentiGuide-Puro sgRNA constructs and selected 
for 48 h with 1 j1g/ml puromycin. These cells were plated for 48 h following selec- 
tion in 96-well plates and cell viability was assessed using an Alamar Blue Stain 
(Life Technologies). 

In vivo animal experiments. We followed the Guidelines for the Care and Use of 
Mammals in Neuroscience and Behavioural Research from the National Research 
Council to estimate the minimal number of animals necessary to assess statistical 
significance. The number of animals per arm was based upon the following calcu- 
lation: N= 1 + 2C(s/d)” where n is the number of animals per arm, C= 7.85 when 
a=0.05 and 1—G=0.8 (significance level of 5% with a power of 80%), s is standard 
deviation, and d is the difference to be detected. All animal experiments were per- 
formed in accordance with local IACUC regulations and protocols. Animal experi- 
ments were conducted in a single-blinded fashion, and endpoints were assessed 
by an independent animal technician in the laboratory. 250,000 H612 cells were 
xenografted intracranially into NOD/SCID/, female mice. Tumours were allowed 
to develop for 14 days then independently randomized into a treatment or vehicle 
group. AZD4547 (25 mg/kg/d) or vehicle (Sigma: 1% Tween-80) were admini- 
stered daily by oral gavage. Survival of mice was plotted using a Kaplan-Meier 
curve and quantified using a log-rank test. Our study did not measure tumour 
size or volume directly. We monitored neurological signs and behaviours associ- 
ated with brain tumour development in accordance with our LACUC protocols 
and regulations. 
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Data availability. All raw data files were deposited in the European Genome- 
phenome archive (https://www.ebi.ac.uk/ega/home) under the accession number: 


EGAS00001002696. 
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Extended Data Figure 1 | DNA fingerprint analysis of ependymoma 
sequence data. a, b, Unsupervised clustering of ChIP-seq, RNA-seq, WES, 
WGS, and Illumina DNA methylation profiles with genotypes that have an 
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average heterozygosity score greater than 0.25 in the Heidelberg (n= 25 
independent samples) (a) and Toronto cohorts (n = 18 independent 
samples) (b). 
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alterations detected by WGS in primary ependymoma samples (n = 24 
independent samples). 
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Extended Data Figure 3 | Preprocessing and clustering of ependymoma 
H3K27ac profiles. a, b, Box plots of H3K27ac enhancer profiles 

(n= 556,676 enhancer loci evaluated per sample) before quantile 
normalization for both Heidelberg (n = 24 independent samples) (a) and 
Toronto (n = 18 independent samples) (b) cohorts compared to Roadmap 
Epigenomics and ENCODE cohorts (n= 98 independent samples). Box 
plots are shown with the centre (median), upper and lower quartile range, 
and dotted line indicating minima and maxima per sample. c, d, Box plots 


of H3K27ac enhancers after quantile normalization for both Heidelberg 
(n= 24 independent samples) (c) and Toronto (n = 18 independent 
samples) (d) cohorts compared to the Roadmap Epigenomics cohort 

(n= 98 independent samples). e, f, Unsupervised hierarchical clustering 
of enhancer profiles as measured using the top 10,000 variant enhancer 
loci identified in the Roadmap Epigenomics cohort with the Heidelberg 
(n= 122 independent samples) (e) and Toronto cohorts (n = 116 samples) 
(f) and compared in a pair-wise fashion using a Spearman correlation. 
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Extended Data Figure 4 | See next page for caption. 
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Extended Data Figure 4 | Ependymoma enhancer supporting data. 

a, Number of unique H3K27ac peaks detected by MACS1.4 (P< 1x 10° 
cut-off) with increasing sample number in the Heidelberg cohort (n = 24 
independent samples). b, Box plot of gene expression values comparing 
typical enhancer (n = 9,826 genes) versus super enhancer (n= 1,682 
genes) associated genes. Statistical analysis was assessed using a two-sided 
Wilcoxon rank-sum test. Box plots show the centre (median), upper and 
lower quartile range, and dotted line indicating minima and maxima. 

c, Frequency of enhancer and super enhancer regions as a function of 

size in base pairs. d, Dot plots illustrating the numbers of super enhancers 
detected in the Heidelberg (n = 24 independent samples), Toronto (n= 18 
independent samples) and normal brain (n =7 independent samples) 
cohorts. The horizontal bar indicates the mean. e, Heatmap illustrating 
significant gained and lost enhancer loci in both ependymoma cohorts 
compared to normal brain samples. Comparisons were evaluated using a 
two-sided Wilcoxon rank-sum test with FDR correction and a cut-off 

of FDR <0.05. f, Example plots of normalized and scaled H3K27ac 
RPKM profiles at example ependymoma candidate genes in Heidelberg 


LETTER 


ependymomas and normal brain (NB) (n = 32 independent samples). 

g, Comparison of gene expression of ependymoma super-enhancer- 
associated genes derived from ref. 11 (n = 83 independent samples) 

with normal brain (n = 172 independent samples). Statistical analysis 
was assessed using a two-sided Wilcoxon rank-sum test. h, Table 
comparing the number and per cent confirmation between the Heidelberg 
(n= 24 independent samples) and Toronto ependymoma cohorts (n = 18 
independent samples). i, G-Profiler pathway-enrichment analysis of 
ependymoma-specific super-enhancer-associated genes in the Toronto 
cohort (1 = 18 independent samples), with statistical significance 
determined using a hypergeometric test. j, Overlap analysis measured 

by a two-sided binomial test between tumour-specific ependymoma 
super enhancers and cancer census genes from the Catalogue of Somatic 
Mutations in Cancer (COSMIC) database. k, Classification of tumour- 
specific ependymoma super enhancer genes also found in the COSMIC 
database”? as tumour suppressor genes (1 = 12), oncogenes (1 = 26), or 
unknown (n= 21). 
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Extended Data Figure 5 | Subgroup-specific enhancers of ependymoma. 


a, b, Heatmap of all subgroup-specific active enhancers detected in 
ependymomas in independent samples in the Heidelberg (n = 24 
independent samples) (a) and Toronto (m= 18 independent samples) 

(b) cohorts. c, Box plot of gene expression for ependymoma SE-SSEA- 
associated genes in the Heidelberg cohort (n = 24 independent samples). 
Comparisons were made using a two-sided Wilcoxon rank-sum test. 
Box plots show the centre (median), upper and lower quartile range, 
and dotted lines indicate minima and maxima. d-f, Venn diagrams 

of the number and percentage of subgroup-specific super-enhancer- 
associated loci validated between the Heidelberg and Toronto cohorts. 
g, h, Non-negative factorization of ependymoma super enhancer profiles 
in the Heidelberg (n = 24 independent samples) and Toronto (n= 18 
independent samples) cohorts. i, Normalized H3K27ac profiles for 


LETTER 


subgroup-specific genomic example loci in the Heidelberg cohort with 
at least three biological replicates per subgroup, with the exception of 
ST-EPN-SE, shown as a biological duplicate. j, G-Profiler pathway- 
enrichment analysis of ependymoma subgroup-specific super-enhancer- 
associated genes in the Heidelberg cohort (n = 24 independent samples) 
with statistical significance determined using a hypergeometric test. 
k-n, H3K27ac profiles surrounding the EPHB2 (k) and CCND1 (m) 
loci in the Heidelberg cohort with at least three biological replicates 

per subgroup, with the exception of ST-EPN-SE, shown as a biological 
duplicate. EPHB2 (1) and CCND1 (n) expression by RNA-seq across 
ependymoma subgroups in the Heidelberg cohort with horizontal bars 
indicating the median value and each dot representing an independent 
ependymoma sample (n = 24 independent samples). 
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Extended Data Figure 6 | Workflow describing the functional validation 
of ependymoma super enhancer genes. a, Workflow of super-enhancer 
target-gene prioritization for functional evaluation. b, Bar chart comparing 
the top-ranked super-enhancer-associated genes against top-ranked genes 
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Extended Data Figure 7 | RNA interference of ependymoma super 
enhancer genes. a, Individual shRNA time-course knockdown 
experiments in EP1-NS (ST-EPN-RELA) cells, using two shRNA 
constructs (shRNA.1 and shRNA.2) compared to two controls 
(shCONTROL.1 and shCONTROL.2). Shown are time-course 
experiments for 19 genes performed in six technical replicates. 


shCONTROL.1 


shCONTROL.2 
shRNA.1 
shRNA.2 
6 
Day 
eS = shRNA.2 
38 38 Ht 
5 © 5 «| 
a pa Oo = 
a ry 
o 0 Oo 0 
nol = no} -_ 
oO oO 
2 wo 2 wo 
£ oo] = ro) 
S 5 3.4 
3 = | 3 = | [| .?s|| “294 
a za za za Zz a za za za z 
Day1 Day3 Day5 Day7 Day1 Day3 Day5 Day7 
jenes shRNA constructs targeting genes shRNA constructs targeting genes 
in alphabetical order in alphabetical order 


b, Ependymoma cell viability (EP1-NS) following treatment with shRNAs 
targeting super-enhancer-associated genes over a seven-day time course 
(in alphabetical order). Cell viability data for treatment with non-targeting 
controls: shCONTROL.1 (black), sh CONTROL.2 (grey), and for two gene- 
specific shRNA constructs: shRNA.1 (red) and shRNA.2 (pink). 
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Extended Data Figure 8 | See next page for caption. 


Extended Data Figure 8 | Validation of ependymoma subgroup- 
specific super enhancer genes. a, H3K27ac profiles at the ependymoma- 
specific super enhancer locus IGF2BP1 in the Heidelberg cohort (n =24 
independent samples) with at least three biological replicates per 
subgroup, with the exception of ST-EPN-SE, which is shown as a biological 
duplicate. b, IGF2BP1 gene expression derived from RNA-seq data for 

the Heidelberg cohort (n = 24 independent samples) with a horizontal 

bar for each subgroup indicating the mean. ¢, d, Normalized survival 

of PF-EPN-A (S15) primary cultures (c) and EP1-NS cell cultures (d) 
following shRNA knockdown of IGF2BP1 with two independent non- 
overlapping shRNA constructs compared to shCONTROL.1. Experiments 
performed as six technical replicates and independently validated in three 
biological replicates. Horizontal bars indicates mean values. e, H3K27ac 
profiles at the ependymoma-specific super enhancer locus CACNA1H 

in the Heidelberg cohort with at least three biological replicates per 
subgroup, with the exception of ST-EPN-SE, which is shown as a biological 
duplicate. f, H3K27ac profiles surrounding the CACNA1H locus ina 
ST-EPN-RELA model (EP1-NS), a PF-EPN-A model (S15) and a normal 
neural stem cell control performed in biological duplicates. g, CACNA1H 
gene expression derived from RNA-seq data for the Heidelberg cohort 
(n= 24 independent samples) with a horizontal bar for each subgroup 
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indicating indicating the mean. h, i, Normalized survival of PF-EPN-A 
(S15) primary cultures (h) and EP1-NS (i) cell cultures following shRNA 
knockdown of CACNA1H with two shRNA constructs compared to 
shCONTROL.1. Experiments performed as four technical replicates 

and independently validated in three biological replicates. Horizontal 
bars indicate mean values. j, Normalized cell survival of EP1-NS, S15, 
and NSC194 cells treated with increasing concentrations of mibefradil. 
Shown are technical triplicates, results replicated in biological triplicates. 
k, Overlay of ATAC-seq and H3K27ac-seq data centred upon ATAC- 

seq peak regions identified in the ST-EPN-RELA cell culture EP1-NS. 

1, CRISPR-dCASS9 targeting of CACNA1H active enhancers impairs 
CACNA1H expression. H3K27ac-seq (top) and ATAC-seq (bottom) 
surrounding the CACNA1H locus, indicating regions targeted by 
CRISPR-dCAS9 sgRNA complexes. Region 1 (R1) indicates a negative 
control region devoid of H3K27ac (green), while regions 2-4 (R2—R4) 
indicate experimental regions under evaluation. Experiments replicated 
in biological duplicates. m, Gene expression for various sgRNA constructs 
relative to a ‘dummy’ targeting control (D103), negative control (green), 
and uninfected control. All group comparisons were made using a two- 
sided Wilcoxon rank-sum test; error bars show s.d. and horizontal bars 
indicate mean value. Experiments were replicated in biological triplicates. 
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Extended Data Figure 9 | Validation of ependymoma transcription 
factors. a, b, Gene expression of ‘high activity’ transcription factors 
(ranked <50) (a) and ‘low activity’ transcription factors (ranked >50) 
(b) in ependymoma (n = 83 independent samples) versus normal brain 
tissue (n = 172 independent samples). Box plots showing median value 


(horizontal bar), interquartile range and dotted line representing the 
data range. Comparison between groups was assessed using a two-sided 
Wilcoxon rank-sum test. c, Constituent enhancer activity in the central 
nervous system (CNS) of developing zebrafish embryos derived from 
subgroup-specific super enhancers identified in ependymomas. 
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Extended Data Figure 10 | Putative cell lineage programs of origin 
uncovered by transcription factor mapping. a—c, Immunohistochemical 
staining of Foxjl at day 13.5 of mouse embryonic development (E13.5). 
Staining in discrete regions encompassing the choroid plexus and 
ependymal layer are shown in the forebrain (b) and hindbrain (c). 

d, log, normalized gene expression of FOXJ1 in ependymoma (n = 83 
independent samples) compared to independent sample cohorts of the 
following tissue types: normal brain (n = 172), paediatric glioma (n= 53), 
glioblastoma (n = 84), atypical rhabdoid teratoid tumours (n= 18), 
medulloblastoma (m= 62) and pilocytic astrocytoma (n= 41). Horizontal 
bar indicates the mean value. e, Subgroup-specific gene expression of 
FOXJ1 derived from ref. 1 (n =209 independent samples). Error bars 


indicate s.d. and interquartile range; horizontal bar indicates median. 

f, Gene set enrichment analysis*’ demonstrating significant enrichment 
of the FOX/1 transcriptional program derived from E14.5 mouse embryos 
specifically in PF-EPN-B tumours (n = 209 independent samples). FDR 
corrected significance evaluated by gene set enrichment analysis. 

g, Significant FOXJ1 gene-expression correlations with proteins known 

to regulate cilia assembly and function. P values for significant positive or 
negative correlations have been corrected for multiple testing using the 
Bonferroni method. h-m, FOXJ1 gene set enrichment plots of PF-EPN-A 
(h), PF-EPN-B (i), PF-EPN-SE (j), ST-EPN-RELA (k), ST-EPN-YAPI (1) 
and ST-EPN-SE (m) ependymomas. FDR-corrected significance evaluated 
by gene set enrichment analysis, n = 209 independent samples. 
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Structure of the glucagon receptor in complex with 


a glucagon analogue 


Haonan Zhang!?*, Anna Qiao), Linlin Yang*, Ned Van Eps®, Klaus S. Frederiksen®, Dehua Yang)’, Antao Dai’, 
Xiaoqing Cai’, Hui Zhang’, Cuiying Yi', Can Cao**, Lingli He®, Huaiyu Yang, Jesper Lau®, Oliver P. Ernst®!°, 
Michael A. Hanson", Raymond C. Stevens!*!3, Ming-Wei Wang!*)”5-4, Steffen Reedtz-Runge®, Hualiang Jiang!?, 


Qiang Zhao!?*!° & Beili Wul3!3.16 


Class B G-protein-coupled receptors (GPCRs), which consist of 
an extracellular domain (ECD) and a transmembrane domain 
(TMD), respond to secretin peptides to play a key part in hormonal 
homeostasis, and are important therapeutic targets for a variety of 
diseases!-®, Previous work®|! has suggested that peptide ligands 
bind to class B GPCRs according to a two-domain binding model, in 
which the C-terminal region of the peptide targets the ECD and the 
N-terminal region of the peptide binds to the TMD binding pocket. 
Recently, three structures of class B GPCRs in complex with peptide 
ligands have been solved!*"!*, These structures provide essential 
insights into peptide ligand recognition by class BGPCRs. However, 
owing to resolution limitations, the specific molecular interactions 
for peptide binding to class B GPCRs remain ambiguous. Moreover, 
these previously solved structures have different ECD conformations 
relative to the TMD, which introduces questions regarding inter- 
domain conformational flexibility and the changes required for 
receptor activation. Here we report the 3.0 A-resolution crystal 
structure of the full-length human glucagon receptor (GCGR) in 
complex with a glucagon analogue and partial agonist, NNC1702. 
This structure provides molecular details of the interactions between 
GCGR and the peptide ligand. It reveals a marked change in the 
relative orientation between the ECD and TMD of GCGR compared 
to the previously solved structure of the inactive GCGR-NNC0640- 
mAb1 complex. Notably, the stalk region and the first extracellular 
loop undergo major conformational changes in secondary structure 
during peptide binding, forming key interactions with the peptide. 
We further propose a dual-binding-site trigger model for GCGR 
activation—which requires conformational changes of the stalk, 
first extracellular loop and TMD—that extends our understanding 
of the previously established two-domain peptide-binding model 
of class B GPCRs. 

Activation of GCGR by its endogenous ligand glucagon triggers the 
release of glucose from the liver during fasting, and thus has an impor- 
tant role in glucose homeostasis and is a potential drug target for type 2 
diabetes'*. We recently determined the crystal structure of the full- 
length GCGR in an inactive state in complex with the negative allosteric 
modulator (NAM) NNC0640 and the antigen-binding fragment of an 
inhibitory antibody mAb] (ref. 16). To further understand the molecular 
mechanisms of peptide binding and receptor activation of GCGR, we 
solved the 3.0 A-resolution crystal structure of the full-length GCGR 


bound to a glucagon analogue and low-potency partial agonist, 
des-H1-[E9, K24(4x1E), L27] glucagon (NNC1702) (Fig. la, Extended 
Data Fig. 1 and Extended Data Table 1; see Methods for design of this 
partial agonist). 

In the structure of the GCGR-NNC1702 complex, the ECD and 
the bundle of seven transmembrane helices (I-VII) in the TMD adopt 
similar conformations to those of the corresponding domains in the 
previously determined structure of the GCGR-NNC0640-mAb1 
complex, with C, root-mean-square deviations of 1.2 Aand1.5A, 
respectively. However, the relative orientation between the ECD and 
TMD in the peptide-bound GCGR structure differs markedly from 
that in the inactive GCGR-NNC0640-m4AbI1 structure (Fig. 1b, c). This 
was expected, considering that the ECD orientation in the GCGR- 
NNC0640-mAbI1 structure is not compatible with the two-domain 
peptide-binding model for class BGPCRs'*. Comparison between the 
GCGR-NNC1702 structure and the recently determined structures 
of peptide-bound glucagon-like peptide-1 receptor (GLP-1R) shows 
that the orientation of the ECD relative to the TMD is similar in the 
GCGR-NNC1702 structure and the structure of GLP-1R bound 
to glucagon-like peptide-1 (GLP-1) and Gs protein solved by cryo- 
electron microscopy!? (Extended Data Fig. 2a, b), both of which 
contain the peptide ligands that interact with both the ECD and TMD. 
However, the ECD orientation in the crystal structure of the truncated 
peptide agonist (peptide 5)-bound GLP-1R" is substantially different 
from that in the other two structures (Extended Data Fig. 2c, d). This 
may be due to a lack of interactions between the ECD core and the 
truncated peptide ligand that either enables greater conformational 
flexibility or promotes a unique inter-domain conformation. 

The GCGR-NNC1702 structure reveals secondary structure modi- 
fications of the stalk region (residues G125-K136) and the first extra- 
cellular loop (ECL1; residues S203-A220), compared to the inactive 
GCGR-NNC0640-mAb1 structure (Fig. 2a). These two regions of 
GCGR have previously been suggested!*"’ to be important modulators 
that regulate peptide ligand binding and receptor activation. In the 
GCGR-NNC0640-mAb1 structure, the N-terminal portion of the 
stalk (residues G125-Q131) and ECL1 (residues R201-S217) exhibit 
extended $-strand conformations and make close contacts with each 
other; they form a compact (-sheet structure, which is likely to stabilize 
the receptor in an inactive conformation!®, By contrast, the stalk in the 
peptide-bound GCGR structure forms a 3-turn a-helical extension of 
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a b GCGR-NNC1702. GCGR-NNC0640-mAb1 
crystal structure 


crystal structure 


GCGR-NNC1702 
crystal structure 
Figure 1 | Overall structure of GCGR-NNC1702 complex. a, Crystal 
structure of GCGR-NNC1702 complex. GCGR and NNC1702 are shown 
in cartoon representation. The ECD (residues Q27-D 124), stalk (residues 
G125-K136), TMD (residues M137-Y202 and V221-E426) and ECL1 
(residues $203-—A220) of the receptor and the peptide ligand NNC1702 

are coloured orange, green, blue, magenta and red, respectively. Glycan 
modifications in the ECD and disulfide bonds are displayed as orange and 
yellow sticks, respectively. b, c, Structural comparison between the GCGR- 
NNC1702 structure and the GCGR-NNC0640-mAbI1 structure, shown 

in side (b) and extracellular (c) views. The GCGR-NNC1702 structure 
and the receptor in the GCGR-NNC0640-mAbI1 structure (PDB ID: 
5XEZ) are shown in cartoon representation and coloured blue and yellow, 
respectively. The peptide NNC1702 is in red. The ECD of the receptor in 
both structures is also shown in surface representation. The red arrow 

in c indicates a rotation of the ECD in the GCGR-NNCI1702 structure 
compared to the GCGR-NNC0640-mAbD! structure. 


GCGR-NNC0640-mAb1 
crystal structure 


helix I (Fig. 2b), a conformation similar to that observed in the previ- 
ously solved structure of the GCGR TMD (RCSB Protein Data Bank 
(PDB) ID: 4L6R)!*. The stalk has not been modelled in the GLP-1- 
GLP-1R-Gs electron microscopy structure’; in the GLP-1R-peptide 5 
structure, the corresponding linker region forms an unstructured 
loop rather than a helix!*, which may be explained by the absence of 
interaction between this linker region and the truncated peptide ligand 
that is thought to stabilize the helical conformation of the stalk’”. 

In the peptide-bound GCGR structure, ECL1 of the receptor no 
longer forms a 8-hairpin conformation. Instead, it is dissociated from 
the stalk region, and stands upwards in line with helices II and II 
(Fig. 2c). The N-terminal segment of ECL1 (residues 203-1206) lacks 
secondary structure; the C-terminal residues (D209-S217) form a 
2.5-turn a-helix that is connected with helix II] by a short linker 
(residues D218-A220). A similar conformation of ECL1 is observed 
for GLP-1R in the structures of the GLP-1-GLP-1R-Gs and GLP-1R- 
peptide 5 complexes!*“, However, further structural details of ECL1 
in inactive non-peptide-bound GLP-1R are required to determine 
whether the ECL1 of GLP-1R can undergo a similar conformational 
change to that observed in the GCGR structures. 

Previous mutagenesis and hydrogen—deuterium exchange (HDX) 
studies'®'® suggest that the stalk and ECL1 of GCGR are involved 
in peptide ligand binding. Indeed, both regions form extensive 
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Figure 2 | Conformations of the stalk and ECL1. a, Comparison of 

the stalk and ECL1 between the GCGR-NNC1702 structure and the 
GCGR-NNC0640-mAbI1 structure. The GCGR-NNC1702 structure 

and the receptor in the GCGR-NNC0640-mAbI1 structure (PDB ID: 
5XEZ) are shown in cartoon representation and coloured blue and yellow, 
respectively. The peptide NNC1702 is in red. The stalk and ECL1 in the 
GCGR-NNC1702 structure are coloured green and magenta, respectively. 
The stalk and ECL1 in the GCGR-NNC0640-mAb! structure are coloured 
grey and pink, respectively. b, Highlight of the conformational difference 
between the stalks in GCGR-NNC1702 (green) and GCGR-NNC0640- 
mAbI (grey) structures. c, Highlight of the conformational difference 
between ECL1 in GCGR-NNC1702 (magenta) and GCGR-NNC0640- 
mAbI1 (pink) structures. d, Entrance to the orthosteric ligand-binding 
pocket within the TMD. The receptor is shown in surface and cartoon 
representations. The ECD, stalk, ECL1, ECL2 (residues E290-G302) 

and TMD of GCGR are coloured orange, green, magenta, cyan and blue, 
respectively. The peptide NNC1702 is shown in cartoon representation 
and coloured red. 


interactions with the peptide ligand in the GCGR-NNC1702 structure. 
The stalk and ECL] act as two ‘arms’ that hold the peptide tightly and 
greatly strengthen the binding between the receptor and the middle 
portion of the peptide (Fig. 2d). It has been proposed” that the relative 
movement and interaction dynamics of the ECD and TMD via the 
stalk pivot point may be a common feature of class B GPCRs. Both the 
GCGR-NNC1702 structure and the inactive GCGR-NNC0640-mAb1 
structure support this concept and demonstrate that a large conforma- 
tional rearrangement of the stalk and ECL1, which includes the disso- 
ciation of these two regions and their changes in secondary structure, 
is required for peptide ligand binding. These data further support the 
importance of the stalk and ECL1 in GCGR signal transduction. 

The GCGR-NNC1702 crystal structure supports the two-domain 
model of hormone recognition by class BGPCRs”"” (Fig. 3, Extended 
Data Fig. 3 and Extended Data Table 2). In the structure, NNC1702 
forms a continuous a-helix throughout the whole length of the peptide 
(Extended Data Fig. 4). The N-terminal half of the peptide ligand 
(residues S2-L14; residue numbering is consistent with that in gluca- 
gon) binds to the TMD ligand-binding pocket bordered by helices I, 
II and VII and the second extracellular loop (ECL2) (Extended Data 
Fig. 3a). The side chain of the N-terminal residue S2 of NNC1702, 
which is an alanine (A8) in GLP-1, forms a hydrogen bond with residue 
D3857> (numbers in superscript refer to the modified Ballesteros— 
Weinstein numbering system for class B GPCRs!®°) on helix VII 
(Fig. 3b). This agrees with previous data?!” showing that the A8S 
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Figure 3 | Binding mode of NNC1702 to GCGR. a, Cutaway view 
showing NNC1702 binding to the ECD (orange), stalk (green), ECL1 
(magenta) and TMD (blue) of GCGR. The receptor is shown in surface 
and cartoon representations. The peptide ligand is shown as red sticks 
and a red cartoon. b-d, Interactions between the NNC1702 N terminus 
and the GCGR TMD. The receptor and peptide NNC1702 in the GCGR- 
NNC1702 structure are shown as cartoons and coloured grey and red, 
respectively. Residues involved in interactions are shown as sticks and 


mutant of GLP-1 restores binding of the GLP-1R mutant E3877>p, 
whereas the S2A mutant of glucagon rescues binding of the GCGR 
mutant D38577°E, which suggests an important role for this 
hydrogen-bond interaction in recognising the glucagon N terminus. 
Owing to a spatial hindrance caused by S2 and its contact with 
D38572>, the extracellular tip of helix VI shifts away from the central 
axis of the helical bundle by about 6.5 A in the GCGR-NNC1702 struc- 
ture compared to the GCGR-NNC0640-mAbI1 structure (Extended 
Data Fig. 2e). This suggests that the rearrangement of the extracellular 
half of helix VI may have a role in the peptide ligand recognition of 
GCGR. We anticipate that further movement of helix VI will occur 
on G-protein coupling, similar to the conformation observed in the 
structure of the GLP-1-GLP-1R-Gs complex’. 

The N-terminal region of NNC1702 makes multiple interactions with 
ECL2 of the receptor (Fig. 3c), demonstrating the critical role of ECL2 
in peptide ligand binding to GCGR (Supplementary Information). 
Two aromatic residues (F6 and Y10) within the N-terminal region 
of NNC1702, together with Y13 and L14 in the middle region of the 
peptide, form a hydrophobic patch, which has previously been sug- 
gested to be important for mediating binding affinity”. The side chain 
of F6 fits in a sub-pocket formed by several hydrophobic residues on 
helices I and VII (Fig. 3d). The importance of the hydrophobic nature 
and size of this sub-pocket in peptide ligand recognition is supported by 
previous mutagenesis data that show that the glucagon binding affinity 
of GCGR mutants Y145'3°A, Y145!9>N, L38279°PA, L3827*Vv, 
13867" and L38673"F is completely abolished or reduced by at least 
fivefold'®. The other aromatic residue (Y10) in the N-terminal region 
of NNC1702 wedges into a cleft between helices I and II, and forms 
hydrophobic interactions with the residues Y138!3° and L198?7!® of 
GCGR (Fig. 3d). Similar interactions between GLP-1R and the two 
corresponding hydrophobic residues (F12 and V16) of the truncated 
GLP-1 analogue peptide 5 are also observed in the GLP-1R-peptide 5 
complex structure!, which indicates that these two residues have an 
identical role in binding to GLP-1R to that of F6 and Y10 of glucagon 
in binding to GCGR. 

The only negatively charged residue, E9 (corresponding to D9 in 
glucagon), within the N-terminal region of NNC1702 forms a salt 
bridge with residue R3787>° at the extracellular tip of helix VII and 
a hydrogen bond with Q374 on the third extracellular loop (ECL3) 
(Fig. 3b). It was previously reported that the mutation D9E greatly 
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coloured brown (NNC1702), blue (GCGR TMD) and cyan (ECL2). Salt 
bridge and hydrogen bonds are displayed as red and green dashed lines, 
respectively. e, Interactions between NNC1702 and the stalk. Residues of 
the stalk and ECD are shown as green and yellow sticks, respectively. 

f. Interactions between NNC1702 and ECLI1. Residues of ECL1 are shown 
as magenta sticks. g. Interactions between NNC1702 and the GCGR ECD. 
Residues of the ECD are shown as yellow sticks. 


reduced glucagon potency in activating adenylate cyclase, and that 
the R378”3°°Q mutation in GCGR abolished glucagon binding to the 
receptor”!, both of which support the importance of the interaction 
with glucagon. The glucagon-GCGR binding pair D9 and R378” is 
conserved in GLP-1-GLP-1R as D15 and R380”, Similarly, muta- 
genesis studies*® support a possible ionic interaction between D15 
of GLP-1 and R3807°” in GLP-1R. However, the GLP-1-GLP-1R- 
Gs electron microscopy structure suggests that this interaction may 
potentially break up during the transition from an inactive to an active 
conformation (Supplementary Information). Compared to D9 in 
glucagon, the longer side chain of E9 in NNC1702 may form a stronger 
interaction with R378”°*" of GCGR and thus restrict the conforma- 
tional change of helix VII, consistent with the fact that D9E maintains 
binding affinity but reduces glucagon potency. 

The structure of the GCGR-NNC1702 complex reveals molecular 
details of the essential roles that the stalk and ECL1 of GCGR have 
in glucagon recognition”®. The stalk and ECL1 form extensive inter- 
actions with the middle portion of NNC1702 (residues Y13-W25) 
(Fig. 3e, f). The short a-helix (residues D209-S217) within ECL1 
not only interacts with the peptide, but also makes contacts with the 
N-terminal oA helix of the ECD through a hydrophobic core formed 
by ECLI residues V212, W215 and L216 and ECD residues M29 and 
F33 (Fig. 3g). Together with the stalk, the ECL1 a-helix, the ECD 
oA-helix and the peptide ligand form a four-helical bundle (Extended 
Data Fig. 3b), which greatly strengthens the interaction between GCGR 
and the peptide ligand. 

Previous efforts”? to develop potent truncated glucagon antagonists 
demonstrate that the two arginine resides R17 and R18 of glucagon are 
sufficient for the peptide to be recognized by the receptor. Besides the 
hydrogen bond between R17 and residue Q131 on the stalk (Fig. 3e), 
R18 forms an arginine-n interaction with residue W215 on ECL] (Fig. 3f) 
that has a critical role in stabilizing receptor—peptide binding, as the 
mutation W215L completely abolishes glucagon binding to GCGR"®. 
Two bulky residues at the junction between helix II and ECL1, R2012-74 
and Y20277°>, have been suggested to be important for glucagon 
binding!”"'®6, In the GCGR-NNC1702 structure, Y2022”°> forms 
hydrophobic interactions with residues L14 and R18 of the peptide 
(Fig. 3f), in agreement with previous mutagenesis data!* that show 
that replacing Y20277°? with alanine abolishes glucagon binding of the 
receptor. By contrast, the mutant R2017”4°A reduces glucagon binding 
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Figure 4 | Dual-binding-site trigger model for GCGR activation. The 
inactive GCGR-NNC0640-mAbI crystal structure (PDB ID: 5XEZ, 

mAb1 removed), a hypothetical docking pose of glucagon C terminus 

to GCGR, the GCGR-NNC1702 crystal structure and the active GCGR 
conformation represented by the GLP-1-GLP-1R-Gs electron microscopy 
(EM) structure (PDB ID: 5VAI) are shown in cartoon and surface 
representations in the middle panel. The ECD, stalk, ECL1 and TMD of the 
receptor and the peptide ligands are coloured orange, green, magenta, blue 
and red, respectively. The two sites of the receptor-peptide interactions 
that trigger receptor activation, site 1 and site 2, are shown in the top panel 
and bottom panel, respectively, highlighting the conformational changes of 
the receptor from the inactive to the active conformation. Top panel, side 
view; bottom panel, extracellular view. The red arrows in the bottom right 
panel indicate the shifts of helices I, VI and VII in the active conformation 
compared to the inactive conformation. 


by only about sixfold!’; this could be explained by a lack of direct con- 
tact between this residue and the peptide in the GCGR-NNC1702 
structure. Notably, an R20177*D mutation considerably decreases 
the binding of GCGR to glucagon'®”*. Further analysis of the pep- 
tide-bound GCGR structure revealed a ‘sandwich stacking interaction 
formed by R201774 and W215 on ECL1 and the peptide residue R18 
(Fig. 3f) that stabilizes the conformation of ECL] and its interaction 
with the peptide. This structural feature is supported by the fact that the 
R201274°D mutant loses glucagon-binding ability, probably as a result 
of disturbance to the arginine-7—arginine stacking interaction caused 
by this negatively charged residue. 

Together, the GCGR-NNC1702 crystal structure and the structure 
of the inactive GCGR-NNC0640-mAb1 complex expand the previ- 
ously established?! two-domain peptide-binding model of class B 
GPCRs by incorporating another agonist trigger associated with an 
inter-domain conformational shift coupled with a change of secondary 
structure in the stalk region and ECL1 (Fig. 4). Binding of the gluca- 
gon C terminus to the ECD may disrupt the B-sheet structure of the 
stalk and ECL1 and result in dissociation between these two regions, 
which potentially triggers a conformational change of the ECD relative 
to the TMD and initiates receptor activation. Using double electron- 
electron resonance (DEER) spectroscopy, we have demonstrated the 
conformational change of the ECD on peptide binding, which shows 
that the peptide NNC1702 induces a conformational rearrangement 
of the receptor ECD to accommodate peptide binding (Extended Data 
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Fig. 5). The second set of interactions between the peptide N terminus 
and the TMD may enable further conformational changes of the stalk 
and ECLI in secondary structures. The conformational change of the 
stalk may not only mediate the receptor—peptide interaction, but also 
potentially facilitate conformational movements of the TMD helical 
bundle through its effect on the conformation of helix I, which shifts 
towards helix VII on the extracellular side in the active structures of 
calcitonin receptor and GLP-1R compared to the inactive class BGPCR 
structures’-!*, Together with the movement of helix I, the rearrange- 
ments of helices VI and VII at the extracellular ends, which may be 
partially induced by the interaction between H1 of glucagon and the 
receptor (as suggested by our molecular dynamics simulation studies; 
Extended Data Fig. 6 and Supplementary Information), are further 
relayed into conformational changes in the cytoplasmic domain, which 
lead to G-protein coupling and full receptor activation. In contrast to 
the two-domain binding model, the interactions in the middle region 
of the peptide (site 1) are critical not only for driving affinity of the 
peptide but also for triggering the necessary conformational changes 
of the stalk and ECL1 that are associated with full receptor activation. 
The peptide N terminus (site 2) induces further conformational 
rearrangement of the transmembrane helical bundle that is also essen- 
tial for full receptor activation (Fig. 4). This dual-binding-site trigger 
model for GCGR activation updates the long-standing paradigm that 
N-terminal peptide interactions are solely responsible for triggering 
agonist-associated conformational changes, and is consistent with 
the idea that truncated peptides for class B GPCRs can act as partial 
agonists”’, potentially by triggering conformational changes in the 
ECD, stalk and/or ECL1. 

In summary, the GCGR-NNC1702 crystal structure sheds light on 
both the complexity and the molecular details that govern the peptide 
binding and receptor activation of GCGR, and thereby greatly expand 
our understanding of signal transduction by class BGPCRs. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 


Received 10 July; accepted 20 November 2017. 


1. Drucker, D. J. The biology of incretin hormones. Cel! Metab. 3, 153-165 
(2006). 

2. Mulder, J. E., Kolatkar, N. S. & LeBoff, M. S. Drug insight: existing and emerging 
therapies for osteoporosis. Nat. Clin. Pract. Endocrinol. Metab. 2, 670-680 
(2006). 

3. Brenneman, D. E. Neuroprotection: a comparative view of vasoactive intestinal 
peptide and pituitary adenylate cyclase-activating polypeptide. Peptides 28, 
1720-1726 (2007). 

4. Sherwood, N. M., Krueckl, S. L. & McRory, J. E. The origin and function of the 
pituitary adenylate cyclase-activating polypeptide (PACAP)/glucagon 
superfamily. Endocr. Rev. 21, 619-670 (2000). 

5. Gilligan, P. J. & Li, Y. W. Corticotropin-releasing factor antagonists: recent 
advances and exciting prospects for the treatment of human diseases. 

Curr. Opin. Drug Discov. Devel. 7, 487-497 (2004). 

6. Finan, B. et al. Chemical hybridization of glucagon and thyroid hormone 
optimizes therapeutic impact for metabolic disease. Cell 167, 843-857.e14 
(2016). 

7. Longuet, C. et al. The glucagon receptor is required for the adaptive metabolic 
response to fasting. Cell Metab. 8, 359-371 (2008). 

8. Egerod, K. L. et al. A major lineage of enteroendocrine cells coexpress CCK, 
secretin, GIP, GLP-1, PYY, and neurotensin but not somatostatin. Endocrinology 
153, 5782-5795 (2012). 

9. Hollenstein, K. et a/. Insights into the structure of class B GPCRs. Trends 
Pharmacol. Sci. 35, 12-22 (2014). 

10. Parthier, C., Reedtz-Runge, S., Rudolph, R. & Stubbs, M. T. Passing the baton in 
class B GPCRs: peptide hormone activation via helix induction? Trends 
Biochem. Sci. 34, 303-310 (2009). 

11. Mann, R., Wigglesworth, M. J. & Donnelly, D. Ligand-receptor interactions at the 
parathyroid hormone receptors: subtype binding selectivity is mediated via an 
interaction between residue 23 on the ligand and residue 41 on the receptor. 
Mol. Pharmacol. 74, 605-613 (2008). 

12. Liang, Y. L. et al. Phase-plate cryo-EM structure of a class B GPCR-G-protein 
complex. Nature 546, 118-123 (2017). 

13. Zhang, Y. et al. Cryo-EM structure of the activated GLP-1 receptor in complex 
with a G protein. Nature 546, 248-253 (2017). 

14. Jazayeri, A. et al. Crystal structure of the GLP-1 receptor bound to a peptide 
agonist. Nature 546, 254-258 (2017). 


4 JANUARY 2018 | VOL 553 | NATURE | 109 


© 2018 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. 


LETTER 


15. Cho, Y. M., Merchant, C. E. & Kieffer, T. J. Targeting the glucagon receptor family 
for diabetes and obesity therapy. Pharmacol. Ther. 135, 247-278 (2012). 

16. Zhang, H. et al. Structure of the full-length glucagon class B G-protein-coupled 
receptor. Nature 546, 259-264 (20 

17. Yang, L. et al. Conformational states of the full-length glucagon receptor. 
Nat. Commun. 6, 7859 (2015). 

18. Siu, F. Y. et a/. Structure of the human glucagon class B G-protein-coupled 
receptor. Nature 499, 444-449 (2013). 

19. Ballesteros, J. A. & Weinstein, H. Integrated methods for the construction of 
three-dimensional models and computational probing of structure-function 
relations in G protein-coupled receptors. Methods Neurosci. 25, 366-428 
(1995). 

20. Wootten, D., Simms, J., Miller, L. J., Christopoulos, A. & Sexton, P. M. Polar 
transmembrane interactions drive formation of ligand-specific and signal 
pathway-biased family B G protein-coupled receptor conformations. Proc. Nat! 
Acad. Sci. USA 110, 5211-5216 (2013). 

21. Yang, D. et al. Structural determinants of binding the seven-transmembrane 
domain of the glucagon-like peptide-1 receptor (GLP-1R). J. Biol. Chem. 291, 
12991-13004 (2016). 

22. Runge, S. et al. Three distinct epitopes on the extracellular face of the glucagon 
receptor determine specificity for the glucagon amino terminus. J. Biol. Chem. 
278, 28005-28010 (2003). 

23. Ahn, J. M., Medeiros, M., Trivedi, D. & Hruby, V. J. Development of potent 
truncated glucagon antagonists. J. Med. Chem. 44, 1372-1379 (2001). 

24. Unson, C. G., Andreu, D., Gurzenda, E. M. & Merrifield, R. B. Synthetic peptide 
antagonists of glucagon. Proc. Nat! Acad. Sci. USA 84, 4083-4087 (1987). 

25. Moon, M. J. et al. Ligand binding pocket formed by evolutionarily conserved 
residues in the glucagon-like peptide-1 (GLP-1) receptor core domain. J. Biol. 
Chem. 290, 5696-5706 (2015). 

26. Unson, C. G. et al. Roles of specific extracellular domains of the glucagon 
receptor in ligand binding and signaling. Biochemistry 41, 11795-11803 
(2002). 

27. Yin, Y. et al. An intrinsic agonist mechanism for activation of glucagon-like 
peptide-1 receptor by its extracellular domain. Cell Discov. 2, 16042 (2016). 


be 
ee 


Supplementary Information is available in the online version of the paper. 


Acknowledgements This work was supported by CAS Strategic Priority 
Research Program XDBO8020000, CAS grants QYZDB-SSW-SMC024 (B.W.) 
and QYZDB-SSW-SMC054 (Q.Z.), the National Science Foundation of China 
grants 31422017 (B.W.) and 81525024 (Q.Z.), the Shanghai Science and 


110 | NATURE | VOL 553 | 4 JANUARY 2018 


Technology Development Fund 15DZ2291600 (M.-W.W.), the E-Institutes of 
Shanghai Municipal Education Commission (E09013), the Special Program 
for Applied Research on Super Computation of the NSFC-Guangdong Joint 
Fund (second phase) under Grant No. U1501501, and the Canada Excellence 
Research Chairs program and the Canadian Institute for Advanced Research 
(O.P.E.). O.P.E. holds the Anne and Max Tanenbaum Chair in Neuroscience. 
We also thank the computer centre of East China Normal University for 
computational resources. The synchrotron radiation experiments were 
performed at the BL41XU of SPring-8 with the approval of the Japan 
Synchrotron Radiation Research Institute (proposal numbers 2016B2517, 
2016B2518, 2017A2505 and 2017A2506). We thank the beamline staff 
members K. Hasegawa, N. Mizuno, T. Kawamura and H. Murakami of the 
BL41XU for help with X-ray data collection. 


Author Contributions Ha.Z. optimized the construct, developed the purification 
procedure and purified the GCGR proteins for crystallization, performed 
crystallization trials and optimized crystallization conditions. A.Q. helped with 
construct optimization and crystallization trials. LY. performed and analysed 
molecular dynamics simulations. N.V.E. performed and analysed DEER 
spectroscopy. K.S.F. performed and analysed binding and potency assays of 
glucagon and NNC1702. D.Y., A.D. and X.C. designed, performed and analysed 
the whole-cell glucagon binding assay. Hu.Z. collected the X-ray diffraction 
data. C.Y. expressed the GCGR proteins. C.C. and L.H. helped to analyse the 
conformational variety of GCGR. J.L., O.P.E., M.A.H., R.C.S, M.-W.W. and S.R.-R. 
helped with structure analysis and interpretation, and edited the manuscript. 
O.P.E. oversaw DEER spectroscopy. M.-W.W. oversaw the whole-cell glucagon 
binding assay. H.Y. and H.J. oversaw molecular dynamics simulations and 
commented on the manuscript. S.R.-R. designed the peptide and oversaw 
ligand characterization of NNC1702. Q.Z. and B.W. initiated the project, planned 
and analysed experiments, solved the structures, supervised the research and 
wrote the manuscript with input from all co-authors. 


Author Information Reprints and permissions information is available at 
www.nature.com/reprints. The authors declare competing financial interests: 
details are available in the online version of the paper. Readers are welcome to 
comment on the online version of the paper. Publisher’s note: Springer Nature 
remains neutral with regard to jurisdictional claims in published maps and 
institutional affiliations. Correspondence and requests for materials should be 
addressed to B.W. (beiliwu@simm.ac.cn) or Q.Z. (zhaoq@simm.ac.cn). 


Reviewer Information Nature thanks G. Schertler, D. Wootten and the other 
anonymous reviewer(s) for their contribution to the peer review of this work. 


© 2018 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. 


METHODS 


No statistical methods were used to predetermine sample size. The experiments 
were not randomized and the investigators were not blinded to allocation during 
experiments and outcome assessments. 

Peptide design of NNC1702. NNC1702, a low-potency partial agonist of GCGR, 
was designed to have reduced agonist activity but to maintain relatively high 
binding affinity to the receptor by deleting the N-terminal residue H1 of glucagon 
and introducing the mutation D9E* (Extended Data Fig. 1), aiming for better 
stability of the GCGR-peptide complex compared to that of the full agonist-bound 
receptor. Two more mutations, Q24K(4x~E) and M27L, were included to improve 
the solubility and stability of the peptide at neutral pH. 

Cloning and insect-cell expression of GCGR. The codon-optimized human 
GCGR gene (Genewiz) was cloned into a modified pFastBacl vector with a 
haemagglutinin signal sequence at the N terminus and a PreScission protease 
site followed by a 10 x His tag and a Flag tag at the C terminus. The native signal 
peptide, M1-A26, was removed from the N terminus of the receptor. T4-lysozyme 
was fused into the second intracellular loop (ICL2) of GCGR between residues 
1257 and E260. To further improve protein thermostability, 45 residues were 
truncated at the C terminus and the mutation R173?*PA was introduced. Our 
ligand-binding assay showed that the binding affinity of the engineered GCGR to 
both glucagon and NNC1702 is close to that of the wild-type receptor (Extended 
Data Fig. 1d, e). The engineered receptor displayed a higher binding affinity to 
NNC1702 compared to glucagon, in agreement with the fact that the construct was 
optimized to improve protein stability of the GCGR-NNC1702 complex. 

GCGR expression was performed using the same procedure as previously 

described!®. The optimized GCGR construct was expressed in Spodoptera 
frugiperda (Sf9) insect cells (Invitrogen) using the Bac-to-Bac Baculovirus 
Expression System (Invitrogen). 
Purification of GCGR-NNC1702 complex. The cells expressing the GCGR- 
T4-lysozyme protein were lysed in a lysis buffer containing 10 mM HEPES, 
pH 7.5, 20mM KCl, 10mM MgCl and EDTA-free protease inhibitor cocktail 
tablets (Roche), then washed three times with a high salt buffer containing 10 mM 
HEPES, pH 7.5, 1M NaCl, 20mM KCl and 10mM MgCh. Purified membranes 
were resuspended in 10 ml lysis buffer supplemented with 40% glycerol and stored 
at —80°C until use. 

Prior to solubilization, the purified membranes were thawed in 30 ml buffer 
containing 10mM HEPES, pH 7.5, 20mM KCl, 10mM MgCh, 13% glycerol, 40 1M 
NNC1702 and EDTA-free protease inhibitor cocktail (Roche) at 4°C for 1h. The 
receptor was then solubilized in 25 mM HEPES, pH 7.5, 150mM NaCl, 1% (w/v) 
n-dodecyl-3-p-maltopyranoside (DDM, Anatrace) and 0.2% (w/v) cholesteryl 
hemisuccinate (CHS, Sigma) at 4°C for 3h. The supernatant was isolated by ultra- 
centrifugation at 160,000¢ for 30 min. The supernatant was incubated with TALON 
resin (Clontech) overnight at 4°C. 

The TALON resin was washed with 25 column volumes of wash buffer 1 con- 
taining 25 mM HEPES, pH 7.5, 150 mM NaCl, 0.05% (w/v) DDM, 0.01% (w/v) 
CHS, 10% glycerol, 101M NNC1702 and 30 mM imidazole, and followed by 
10 column volumes of wash buffer 2 containing 25mM HEPES, pH 7.5, 150mM 
NaCl, 0.05% (w/v) DDM, 0.01% (w/v) CHS, 10% glycerol, 201M NNC1702 and 
15mM imidazole. The GCGR-NNC1702 complex was eluted with 5 column 
volumes of 25mM HEPES, pH 7.5, 150mM NaCl, 0.05% (w/v) DDM, 0.01% (w/v) 
CHS, 10% glycerol, 501M NNC1702 and 300 mM imidazole. The PD MiniTrap 
G-25 column (GE Healthcare) was used to remove imidazole. The sample was 
treated overnight with custom-made His-tagged PreScission protease to remove 
the C-terminal His-tag and Flag tag, and custom-made His-tagged PNGase F 
was also added to the sample to deglycosylate the receptor. The His-tag cleaved 
GCGR-NNC1702 complex was collected after flowing through a Ni-NTA column 
(Qiagen), and then concentrated to 20-30 mg ml! with a 100-kDa molecular 
weight cut-off concentrator (Millipore). 

Crystallization in lipidic cubic phase. Crystallization was performed using the 
lipidic cubic phase (LCP) method” at 20°C. The protein sample (20-30 mg ml!) 
was mixed with lipid (7.8 MAG/cholesterol 9:1 by mass) at a ratio of 1:1 (v/w) using 
a syringe mixer. The LCP mixture was dispensed onto 96-well glass sandwich 
plates (Shanghai FAstal, BioTech) in 35-40-nl drops and overlaid with 800 nl 
precipitant solution using a Gryphon robot (Art Robbins). Protein reconstitution 
in LCP and crystallization trials were performed at room temperature. Plates were 
incubated and imaged at 20°C using an automated incubator-imager (RockImager, 
Formulatrix). The crystals of the GCGR-NNC1702 complex grew in 100 mM Tris, 
pH 8.0, 70-120 mM potassium phosphate dibasic, 27-33% (v/v) PEG 200, and 
reached a maximum size of 200j1m x 10}1m x 10m after four days. Crystals were 
collected using 75-100-j1m MiTeGen micromounts (M2-L19-50/150, MiTeGen) 
and immediately flash-frozen in liquid nitrogen. 

X-ray diffraction data collection and structure determination. Data collection 
was performed at the SPring-8 beam line 41XU, Hyogo, using a Pilatus3 6M 
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detector (X-ray wavelength 1.0000 A). The crystals were exposed with an 
11-\1m x 9-j1m mini-beam for 0.2 and 0.2° oscillation per frame. Owing to 
radiation damage, data collection was limited to 5-10° per crystal. Diffraction 
data from 10 crystals were integrated and scaled using XDS”. 

The structure of the GCGR-NNC1702 complex was solved by molecular 
replacement implemented in Phaser*’ using the models of GCGR TMD in the 
structure of the GCGR-NNC0640-mAb1 complex, GCGR ECD in the structure 
of the GCGR-ECD-mAb1 complex and T4-lysozyme (PDB IDs: 5XEZ, 4LF3 and 
2RHL1, respectively). One molecule of GCGR TMD, one molecule of GCGR ECD 
and one molecule of T4-lysozyme were found sequentially by molecular replace- 
ment search. The structure was initially solved and refined to an Rfree of approxi- 
mately 40% with REFMAC*!. The model maps from the data were of sufficient 
quality to interpret the overall structure of the GCGR-NNC1702 complex; the 
stalk, ECL1 and the peptide NNC1702 were built on the basis of electron density 
map. The model then underwent iterated cycles of manual building into |2F,| — |F.| 
maps with Coot* and refinement with REFMAC?! and BUSTER™. The structure 
was carefully refined, and Ramachandran plot analysis indicates that 100% of the 
residues are in favourable (94.0%) or allowed (6.0%) regions (no outliers). 

The final model of the GCGR-NNC1702 complex contains 398 residues 
(Q27-T257 and E260-E426) of GCGR, 28 residues (S2-T29) of NNC1702 and 160 
residues (N2-Y161) of T4-lysozyme. The 4x%E tail of the mutation Q24K(4xE) 
in NNC1702 was not traced owing to poor electron densities. There is no residue 
from the receptor or neighbouring molecules adjacent to this mutation, which 
reduces the possibility that this tail had an effect on the structure. The ECD of the 
receptor forms contacts with the T4-lysozyme fusion proteins from two neigh- 
bouring molecules in the crystal lattice with buried surface areas of 150 A? and 
340 A*, which are much smaller than the buried surface area between the ECD, 
TMD and the peptide ligand (4,760 A?). This indicates that the lattice interactions 
of the ECD are considerably weaker than the interactions between the ECD and the 
TMD or peptide, which have key roles in stabilizing the ECD conformation. This 
suggests that crystal packing is unlikely to have an effect on the conformational 
change of the GCGR ECD. 

Ligand-binding assay. To determine the binding affinities of human glucagon 
and NNC1702 to GCGR, we performed scintillation proximity assay (SPA) 
binding using plasma membranes from BHK cells expressing the human GCGR. 
The BHK cell line was stably transfected with GCGR and CRE luciferase. Cells 
were routinely tested for mycoplasma contamination. Plasma membranes were 
prepared by washing cultured cells in PBS before lysis in ice-cold 25 mM HEPES, 
2mM MgCh and 1mM EDTA (HME) buffer. Tubes with lysed cells were frozen 
in liquid nitrogen and quickly thawed again. The thawed cell lysate was vortexed at 
maximum speed for 20s and centrifuged at 20,000g for 10 min at 4°C. Pellets were 
re-suspended in HME buffer and protein concentrations determined by BioRad 
protein assay (Bradford; BioRad). Membranes (5 1g per well) were combined with 
2.5mg ml | wheat germ agglutinin (WGA)-coated SPA beads (Perkin Elmer), 
diluted ligand (highest final concentration, 1 ;.M) and !°I-labelled glucagon-NH 
(60 pM) in binding buffer (50 mM HEPES, pH 7.4, 5mM MgCh, 1mM CaCh, 
0.02% Tween-20 and 0.1% ovalbumin) and incubated for 2h at 25°C. Assay plates 
(OptiplateTM-96, Perkin Elmer) were centrifuged for 10 min at 1,500r.p.m. at 
room temperature before counting using a Topcounter (Perkin Elmer). Half 
maximal inhibitory concentration (ICs9) values were calculated with the GraphPad 
Prism software (version 7.0a, GraphPad). 

Ligand-potency assay. To determine the potencies of glucagon and NNC1702, 
we activated human GCGR in transfected BHK cells. The BHK cell line was stably 
transfected with the human GCGR and CRE luciferase. Assay was performed in 
DMEM medium without phenol red (Gibco 11880-028, Thermo Fisher Scientific), 
10mM HEPES (Gibco 15630), 1x Glutamax (Gibco 35050), 1% ovalbumin and 
0.1% Pluronic F-68. Ligands were dissolved into 300 1M stocks in 80% DMSO, 
and serial dilutions were prepared in medium with 11M as the highest final 
concentration. Before incubation in Black Microwell 96-well plates (Thermo Fisher 
Scientific), cells were washed twice in PBS and adjusted to 100,000 cells per ml. The 
assay plate was incubated for 3h in 5% CO at 37°C. Aliquots of Steadylite Plus 
were added to each well and shaken for 30 min at room temperature before the 
plate luminescence was read on a BioTek Synergy2 reader (BioTek). Half maximal 
effective concentration (ECsg) values were calculated with Prism software. 
Whole-cell ligand-binding assay. To determine the binding affinity of glucagon 
and NNC1702 to the engineered GCGR used for crystallization, we performed a 
whole-cell ligand-binding assay. CHO-K1 cells (obtained from American Type 
Culture Collection) were seeded onto 96-well cell-culture plates (PerkinElmer) 
treated with poly-p-lysine, at a density of 3 x 10° cells per well. The cells were 
routinely tested for mycoplasma contamination. After overnight culture, the 
cells were transiently transfected with wild-type or the engineered GCGR using 
Lipofectamine 2000 transfection reagent (Invitrogen). Cells were collected 24h after 
transfection, washed twice and incubated with a blocking buffer (F12 supplemented 
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with 33 mM HEPES, pH 7.4, and 0.1% BSA) for 2h at 37°C. Cells were then washed 
twice with PBS and incubated in a binding buffer (PBS supplemented with 10% 
BSA, pH 7.4) with a constant concentration of !”°I-labelled glucagon (60 pM) and 
varying concentrations of unlabelled glucagon and NNC1702 (17.86 pM-5 1M) 
at room temperature for 3h. Cells were washed three times with ice-cold PBS and 
lysed by 50,11 lysis buffer (PBS supplemented with 20 mM Tris-HCl, 1% Triton 
X-100, pH 7.4). The plates were subsequently counted for radioactivity (counts 
per minute) in a scintillation counter (MicroBeta” Plate Counter, PerkinElmer) 
using a scintillation cocktail (OptiPhaseSuperMix, PerkinElmer). 

Molecular dynamics simulation. To investigate the binding of glucagon to GCGR 
and the role of glucagon in receptor activation, we conducted long-time molecular 
dynamics simulations on the basis of the crystal structure of GCGR-NNC1702. 
The prepared GCGR structure was obtained by back-mutating the R173A muta- 
tion to its wild-type residue, omitting T4-lysozyme, completing ICL2 and adding 
A26 to the N terminus of GCGR. To obtain the prepared glucagon structure, the 
residue H1 was added with an a-helical secondary structure to the N terminus of 
NNCI1702 in PyMOL (The PyMOL Molecular Graphics System, version 1.8), and 
without space conflict with neighbouring residues. The other three mutations— 
D9E, Q24K(4xE), M27L—were back-mutated to their wild-type residues. This 
GCGR-glucagon model was used as the starting structure for molecular dynamics 
simulations. The chain termini of GCGR and glucagon were all charged, except for 
the C terminus of GCGR, which was capped with neutral groups. Most notably, 
all titratable residues were left in the dominant protonation state at pH 7.0, which 
was calculated using the H++ server (http://biophysics.cs.vt.edu/H++), and H1 
was protonated during all simulations. 

The GCGR~glucagon model was then embedded in a 90 A x 90 A palmitoyl 
oleoyl phosphatidyl choline (POPC) bilayer and the lipids located within 1 A of 
the receptor were removed. The system was solvated in a box (90 A x 90 A x 156A) 
with TIP3P water model and 0.15 M NaCl, including 241 lipid molecules, 28,622 
water molecules, 92 chloride ions and 80 sodium ions, for a total of 125,237 atoms. 
Three parallel 1-j1s molecular dynamics simulations were performed using the 
GROMACS 5.1.4 package* with isothermal-isobaric ensemble and periodic 
boundary condition. The CHARMM36-CAMP force field*® was used for the 
protein, glucagon, the POPC phospholipids, ions and water molecules. First, energy 
minimizations were performed to relieve unfavourable contacts in the system; 
this was followed by equilibration steps to a total of 50 ns to equilibrate the lipid 
bilayer and the solvent, with restraints on the main chain or Ca atoms of GCGR. 
Subsequently, three 1-1s production runs were performed. The temperature of the 
systems was maintained at 310 K using the v-rescale method* with a coupling time 
of 0.1 ps. The pressure was kept at 1 bar using the Parrinello-Rahman method*” 
with 7) = 1.0 ps and a compressibility of 4.5 x 10~° per bar. SETTLE** constraints 
were applied to the hydrogen-involved covalent bonds in water molecules, and 
LINCS* constraints were applied to the hydrogen-involved covalent bonds in other 
molecules; the time step was set to 2 fs. Electrostatic interactions were calculated 
with the Particle-Mesh Ewald algorithm“ with a real-space cut-off of 1.2nm. 
DEER spectroscopy of GCGR. The GCGR mutant with H89 replaced by cysteine 
and C171 replaced by serine was generated by removing a single reactive cysteine 
residue at position 171 and introducing a reactive cysteine at position 89. The 
native cysteine C287 was used as a reference for GCGR conformational changes. 
The mutant was expressed as described above, and then purified in the absence 
of any ligand or in the presence of a ligand (NNC1702 or NNC0640), following 
the same protocol that was used to prepare protein samples for crystallization. 
For DEER measurements, the apo mutant and the mutant-ligand complexes were 
reacted with the sulfhydryl-specific label (1-oxyl-2,2,5,5-tetramethyl-A?-pyrroline- 
3-methyl) methanethiosulfonate (MTSSL, Toronto Research Chemicals) to 
generate R1 nitroxide side chains at positions 89 and 287, following standard 
procedures*!. The spin-labelled samples of the apo receptor and the receptor-ligand 


complexes were concentrated to 50-70 1M. For the receptor-ligand complexes, 
501M NNC1702 or 301M NNC0640 was added to the buffer to increase protein 
stability. Deuterated glycerol (20%) was added to the samples as a cryoprotectant. 
The spin-labelled mutants were loaded into quartz capillaries (1.5-mm ID and 
1.8-mm OD) and flash-frozen using a dry-ice-ethanol bath. After freezing, 
they were loaded into an ER 5107D2 Q-band flexline resonator, and Q-band 
measurements were performed at 80K on a Bruker Elexsys 580 spectrometer 
(at the University of Toronto) with a Super Q-FTu Bridge. A 32-ns 1-pump pulse was 
applied to the low-field peak of the nitroxide field swept spectrum, and the observer 
1/2 (16-ns) and x (32-ns) pulses were positioned 50-MHz (17.8-G) upfield, which 
corresponds to the nitroxide centre line. Distance distributions were obtained from the 
raw DEER data using the LabVIEW program ‘LongDistances’ (v.593, by C. Altenbach, 
http://www.biochemistry.ucla.edu/biochem/Faculty/Hubbell/). Background correc- 
tion was performed using a quadratic background. The primary DEER data were 
fitted via a ‘model-free’ algorithm as implemented in the ‘LongDistances’ software. 
Nitroxide labels were modelled into the GCGR-NNC1702 crystal structure using the 
Multiscale Modelling of Macromolecular systems (MMM) software package (http:// 
wwwepr.ethz.ch/software/mmm-older-versions.html). Common nitroxide rotamers 
were used for the modelling”. 

Data availability. Atomic coordinates and structure factor files for the GCGR- 
NNC1702 structure have been deposited in the RCSB Protein Data Bank with 
identification code 5YQZ. All other data are available from the corresponding 
authors upon reasonable request. 
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Extended Data Figure 1 | Binding affinity and potency of NNC1702. 
a, Sequences of glucagon and NNC1702. b, Binding assay of NNC1702. 
Competitive binding of human glucagon (red dots) and NNC1702 (green 
squares) to membranes from BHK cells that stably express human GCGR, 
on WGA-coated SPA beads. Glucagon labelled with !*°I (60 pM), and 
increasing concentrations of human glucagon and NNC1702, were used to 
generate the binding curves (representative example shown) and calculate 
ICs values (glucagon: 1.2 + 0.5nM, NNC1702: 12.8 + 6.6 nM). At least 
three independent experiments were performed with technical duplicates. 
c, Potency of NNC1702. The potencies of human glucagon (red dots) and 
NNCI1702 (green squares) were determined by luciferase assays using 


BHK cells stably transfected with the human GCGR and CRE luciferase. 
Serial dilutions were prepared in medium (with 11M as the highest final 
concentration). Plate luminescence was read and ECs values (glucagon: 
22.8 + 18.2 pM, NNC1702: 16.2 + 8.4nM) were calculated from the 
activation curves. At least three independent experiments were performed 
with technical duplicates (representative example shown). d, e, Inhibition 
of '*°I-labelled glucagon binding to CHO-K1 cells expressing wild-type 
(WT) and the engineered GCGR used for crystallization by glucagon 

and NNC1702. Data are shown as mean +s.e.m. from three independent 
experiments performed in duplicate. ‘Construct’ indicates the GCGR 
construct used for crystallization. The ICs» values are listed in e. 
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Extended Data Figure 2 | Structural comparison between the GCGR- 
NNC1702 crystal structure and previously solved class B GPCR 
structures. a, b, Comparison between the GCGR-NNC1702 crystal 
structure and the cryo-electron microscopy structure of GLP-1-GLP-1R- 
Gs complex in side (a) and extracellular (b) views. The GCGR-NNC1702 
structure is shown in cartoon representation and coloured blue (GCGR) 
and red (NNC1702). The GLP-1-GLP-1R-Gs electron microscopy 
structure (PDB ID: 5VAI) is shown in cartoon representation and coloured 
grey (GLP-1R) and green (GLP-1). c, d, Comparison between the crystal 
structures of the GCGR-NNC1702 and GLP-1R-peptide 5 complexes in 
side (c) and extracellular (d) views. The receptor in the GLP-1R-peptide 
5 structure (PDB ID: 5NX2) is shown in cartoon representation and 
coloured pink. The ligand peptide 5 is shown as yellow sticks. The red 
arrow (in d) indicates the rotation of the ECD in the GLP-1R-peptide 5 


structure compared to the GCGR-NNC1702 structure. e, Comparison 
between the GCGR-NNCI1702 structure and the GCGR-NNC0640-mAb1 
structure. Only the GCGR TMD in both structures and the peptide ligand 
NNC1702 are shown as cartoons. The TMD in the GCGR-NNC1702 
structure is in blue; the TMD in the NNC0640-bound structure is in 
yellow; and NNC1702 is in red. A close inspection of the two full-length 
GCGR structures revealed a spatial hindrance caused by the residue S2 of 
NNC1702 and its contact with D385”? in the peptide-bound structure, 
pushing the residue F365°°° on helix VI away from the ligand-binding 
pocket and subsequently leading to the outward shift of the extracellular 
portion of helix VI (red arrow). The residues F365°°* and D3857“?° in 
both structures are displayed as sticks. The hydrogen bond between S2 and 
D385’ in the peptide-bound structure is shown as a green dashed line. 
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Extended Data Figure 3 | Ligand-binding pocket of NNC1702 and 

interactions between GCGR and NNC1702. a, Extracellular view of 

the binding pocket of NNC1702 N-terminal region within the GCGR 

TMD. The receptor and the peptide ligand are shown as cartoons, and between GCGR and the C-terminal region of NNC1702 (residues 

coloured green (stalk), magenta (ECL1), cyan (ECL2), blue (TMD) and D21-T29). The stick drawings of GCGR residues and NNC1702 are 

red (NNC1702). b, Binding site of NNC1702 C-terminal region in the coloured grey and red, respectively. The labels of GCGR residues are 
coloured orange (ECD), green (stalk), blue (TMD), magenta (ECL1) and 


stalk (blue), ECL1 (magenta) and ECD (orange) of GCGR. c-e, Schematic 
representation of interactions between GCGR and NNC1702 analysed cyan (ECL2). The labels of NNC1702 residues are red. 


” Asp24 


by LigPlot* (ref. 43). c, Interactions between GCGR and the N-terminal 
region of NNC1702 (residues S2-Y10). d, Interactions between GCGR 
and the middle region of NNC1702 (residues $11-Q20). e, Interactions 
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Extended Data Figure 4 | Electron densities of the structure of the coloured blue. b-d, Electron densities of key GCGR residues involved in 
GCGR-NNC1702 complex. a, Electron densities of NNC1702. The NNC1702 binding. The receptor is shown in grey cartoon representation. 
peptide NNC1702 is shown in red cartoon representation and as brown The key residues are shown as sticks and coloured yellow (ECD), green 


sticks. Electron densities are contoured at 1.00 from a |2F,|—|F.| map and (stalk), magenta (ECL1) and blue (TMD). 
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Extended Data Figure 5 | DEER spectroscopy of assembly of GCGR- 
ligand complex. a, The GCGR-NNC1702 assembly showing modelled R1 
spin labels at the ECD site H89R1 and the TMD site C287R1 on the basis 
of the GCGR-NNC1702 crystal structure. The nitroxide rotameric models 
were generated with the MMM software package“. b, Experimental 
distance distributions between the nitroxide spin-labelled R1 pair of 
H89R1 and C287R1 in the apo state or in the presence of NNC0640 or 
NNCI1702. The experimental distributions were normalized by area under 
the curves for comparison purposes. A predicted distance distribution 
based on the GCGR-NNC1702 structure that was derived from the 
MMM software (offset blue trace) is also shown. This prediction can be 
directly compared to the experimentally measured distributions, though 
rotameric weighting may be different in the prediction. c, Background- 
corrected dipolar evolution functions (DEFs) and their fits for each of 

the GCGR samples. The DEF functions were scaled to compare traces. 
The traces of the apo receptor and the GCGR-NNC0640 and GCGR- 
NNC1702 complexes are offset in the main plot to show the quality of 

the fits. The inset shows the overlaid portion of the DEFs. The DEER 

data demonstrate that all protein samples exhibit multiple peaks, and the 
addition of the peptide NNC1702 populates longer distances (32-43 A), 
which match the distance distribution predicted by the MMM software 
using the GCGR-NNC1702 structure as a template (b). The main DEER 
distance that the apo GCGR and the NNC0640-bound receptor showed 

is around 26 A. The conformation possibilities of this distance include 

the inactive conformation observed in the GCGR-NNC0640-mAb1 
structure in which the H89R1-C287R1 distance is about 26 A between 


Distance (A) 


LETTER 


b c 
H89R1-C287R1 4.02. H89R1-C287R1 1.01 
Normalized Distance Distributions . 
2 1.00 
2 0.99 +NNC1702 
[= 
= 1.004 ; oo 
£ —GCGR-apo = fit 5 0.97 
a — GCGR-NNC0640 @ 0.96 ~— r : 
3 —ccocr-nnci702|  & 00 02 04 06 
a £0984 Time (us) 
o ° 
i = GCGR-apo 
= (S] 
& wi 
4 GCGR-NNC1702 
0.9674 
MMM predicted 
distance distribution of GCGR-NNC0640 
NNC1702-bound state 
0.94. —_—_ a or ose 
20 30 40 50 60 70 80 -0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0 


Time (us) 


nitroxide N-O bonds when using common RI rotamers” for modelling, 
and the different inactive conformational states of the apo receptor 

that display close contacts between the ECD and TMD, as suggested 

by previous molecular dynamics simulation studies'®!” , with H89R1- 
C287R1 distances of 23-29 A between the nitroxide N-O bonds when R1 
side chains are modelled. These results suggest that the ECD in the apo 
GCGR or the NNC0640-bound receptor may adopt one conformation or 
multiple conformations, with a H89R1-C287R1 distance of about 26 A 
between nitroxide N-O bonds. The longer distance upon binding to the 
peptide ligand NNC1702 indicates that the receptor ECD undergoes a 
conformational change to accommodate the peptide. Equilibrium between 
these conformational states may potentially exist. NNC1702 probably 
shifts it towards the conformation favourable for peptide binding, in 
contrast to the small-molecule NAM NNC0640 that has a weak effect 

on the ECD conformation. This equilibrium between peptide-free and 
peptide-bound receptors may help explain the fact that more than one 
peak was observed for the GCGR-NNC1702 complex in this study, 
although the concentration of NNC1702 used during protein purification 
and DEER measurements is 50 }1.M, which is much higher than the binding 
affinity of the peptide. G-protein binding may further shift the equilibrium 
to the peptide-bound conformation, although specific experimental data 
regarding the G-protein-bound receptor are required to validate this 
point. Our findings support the flexibility of the ECD conformation and 
further highlight that the conformational change of the ECD is required 
for peptide binding. 
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Extended Data Figure 6 | Comparison between the GCGR-NNC1702 b, Close-up view of the interaction between H1 of glucagon and D385” 
structure and the GCGR-glucagon model derived from molecular of GCGR in the molecular dynamics simulations. The NNC1702 
dynamics simulations. a, Extracellular view of the transmembrane residue S2, the glucagon residues H1 and S2 and the GCGR residue 
helical bundle. The GCGR-NNC1702 structure is shown in cartoon D3857- in both the GCGR-NNC1702 structure and the GCGR- 
representation and coloured blue (GCGR) and red (NNC1702). The glucagon model are shown as sticks. The hydrogen bond formed by H1 
GCGR-glucagon model derived from molecular dynamics simulations is and D385” in the molecular dynamics simulations is displayed 
shown in cartoon representation and coloured orange (GCGR) and yellow _as a red dashed line. c, Intracellular view of the transmembrane helical 
(glucagon). The green arrows indicate shifts of helices VI and VII. bundle. The green arrows indicate shifts of helices V and VI. 
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Extended Data Table 1 | Data collection and structure refinement statistics 


Data Collection* 
Space group 
Cell dimensions 

a, b, c (A) 

a, B, y (°) 
Resolution (A) 
Rpim (%) 

!/7ol 
Completeness (%) 
Redundancy 


Refinement 
Resolution (A) 
No. reflections 
Rwork/ Riree (%) 
No. atoms 
Protein 
Peptide 
Lipids/others 
B-factors (A?) 
GCGR 
T4L 
Peptide 
Lipids/others 
R.m.s. deviations 
Bond lengths (A) 
Bond angles (°) 


P21 2121 


60.1, 108.8, 216.3 
90, 90, 90 
50.0-3.00 (3.11-3.00)t 
5.2 (76.4) 
21.6 (0.6) 
97.8 (86.4) 
10.1 (5.3) 


50.0-3.00 
27,458 
23.2 / 26.1 


4,328 
236 
146 


119.7 
155.8 
111.0 
168.0 


0.009 
1.05 


«Diffraction data from ten crystals were used to solve the structure. 
{Values in parentheses are for the highest-resolution shell. 
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Extended Data Table 2 | Interactions between NNC1702 and GCGR 


Residues in NNC1702 Residues in GCGR Interactions 
Ser2 D3857-42b Hydrogen bond* 
1491-475 Hydrogen bond 
Gin3 1912-646 
L3867435 
Gly4 T296&Cl2 Hydrogen bond 
D299FCl2 
Thr5 1 3857:3% Hydrogen bond 
Y13g136 
F141139 
1451482 
Phe6 1 3a973% 
F383740 
L3867-43° 
1194267 
Thr7 1198271 
T2962 
Ser8 $297FCl2 Hydrogen bond 
Glug Q374ecls Hydrogen bond 
R378735 Salt bridge 
1.36b 
Tyrt0 T9827 
Ser11 L198271® 
Lys12 Q293Cl2 Hydrogen bond 
Tyr13 vi34stak 
L141 gg271b 
Leu14 2022.75 
Asp15 v2gFC? Hydrogen bond 
Arg17 qi315* Hydrogen bond 
Y2022:75> 
Arg18 Q204ecu! Hydrogen bond 
w215et 
Ala19 v2gEC0 
we7Eco 
Gin20 M123&00 Hydrogen bond 
Qi31siak Hydrogen bond 
Asp21 1206£Cl" 
M2gFC2 
L32EcD 
F3360D 
Phe22 wee? 
1206ECU1 
V212ECL1 
Ye5ecD 
Val23 La5ecD 
we7Ec 
K64ECD 
Trp25 1206£CU" 
G207FCl" Hydrogen bond 
Ww36EC2 
Leu26 Ye5eCD 
ye4qec? 
Y65ECD 
Leu27 R116&00 Hydorgen bond 
A118&°° 
Thr29 K64ECD Hydrogen bond 


*Polar interactions, hydrogen bond and salt bridge, are listed. 
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Structure of the complement C5a receptor bound to 
the extra-helical antagonist NDT9513727 


Nathan Robertson", Mathieu Rappas!*, Andrew S. Doré!*, Jason Brown!, Giovanni Bottegoni', Markus Koglin!, Julie Cansfield!, 


AliJazayeri', Robert M. Cooke! & Fiona H. Marshall! 


The complement system is a crucial component of the host response 
to infection and tissue damage. Activation of the complement 
cascade generates anaphylatoxins including C5a and C3a. C5a 
exerts a pro-inflammatory effect via the G-protein-coupled receptor 
C5a anaphylatoxin chemotactic receptor 1 (C5aR1, also known as 
CD88) that is expressed on cells of myeloid origin’. Inhibitors of 
the complement system have long been of interest as potential drugs 
for the treatment of diseases such as sepsis, rheumatoid arthritis, 
Crohn’s disease and ischaemia-reperfusion injuries'. More recently, 
a role of C5a in neurodegenerative conditions such as Alzheimer’s 
disease has been identified*. Peptide antagonists based on the C5a 
ligand have progressed to phase 2 trials in psoriasis and rheumatoid 
arthritis; however, these compounds exhibited problems with off- 
target activity, production costs, potential immunogenicity and 
poor oral bioavailability. Several small-molecule competitive 
antagonists for C5aR1, such as W-540115 and NDT9513727%, have 
been identified by C5a radioligand-binding assays*. NDT9513727 
is anon-peptide inverse agonist of C5aR1, and is highly selective for 
the primate and gerbil receptors over those of other species. Here, 
to study the mechanism of action of C5a antagonists, we determine 
the structure of a thermostabilized C5aR1 (known as C5aR1 StaR) 
in complex with NDT9513727. We found that the small molecule 
bound between transmembrane helices 3, 4 and 5, outside the 
helical bundle. One key interaction between the small molecule 
and residue Trp213>*? seems to determine the species selectivity 
of the compound. The structure demonstrates that NDT9513727 
exerts its inverse-agonist activity through an extra-helical mode 
of action. 

To obtain the structure of C5aR1, a thermostabilized receptor 
(StaR) was generated as described previously”*. C5aR1 was ther- 
mostabilized in the presence of the inverse-agonist radioligand [?H] 
NDT9513727 (N,N-bis(1,3-benzodioxol-5-ylmethyl)-1-butyl-2, 
4-diphenyl-'H-imidazole-5-methanamine)®, and contains 11 amino 
acid substitutions (Extended Data Fig. 1) that had no effect on the 
pharmacology or ligand binding of the receptor (Extended Data Fig. 2). 
To promote crystallization further, 29 and 17 residues were removed 
from the N and C terminus of the receptor, respectively. The C5aR1 
was crystallized in lipidic cubic phase and solved at 2.7 A resolution 
(Extended Data Fig. 3 and Extended Data Table 1), with two copies of 
C5aR1 bound to NDT9513727 present in the asymmetric unit (Fig. la). 
The overall structure of C5aR1 is similar to that of other class A 
receptors crystallized in the inactive state and consists of the canon- 
ical seven-transmembrane (TM1-TM7) helix arrangement (Fig. 1a). 
Cys109%° (superscripts denote Ballesteros—-Weinstein numbering) at 
the N-terminal end of TM3 forms a conserved disulfide bond with 
Cys188 in the second extracellular loop (ECL), which itself forms an 
extended }-hairpin. Continuous density is observed for all extracellular 
and intracellular loops apart from the C-terminal connection of ECL2 
with TM5 and the junction of TM7 with helix 8, whereas intracellular 


loop 2 (ICL2) adopts a two-turn c-helical structure similar to that of 
CCR9? and other chemokine receptor structures (Fig. la, c). 

The extracellular portions of the transmembrane a-helices and ECLs 
have been previously shown to form the peptide-binding vestibule, or 
orthosteric site, for peptide-binding class A G-protein-coupled recep- 
tors (GPCRs), for example, the neurotensin 1 receptor!® (TM2-TM7 
and ECLs). Electron density is well defined across this region in C5aR1; 
however, it is found to be unoccupied by NDT9513727. Instead, the 
small molecule binds towards the intracellular side of the receptor and 
outside the transmembrane helical bundle (Fig. 1b-d). Furthermore, 
cold competition experiments (Extended Data Fig. 4) demonstrate 
no displacement of radiolabelled NDT9513727 with the C5a agonist 
peptide or the PMX53 antagonist macrocycle. 

Clear difference density for NDT9513727 (Fig. 2a) is found with one 
benzodioxolane group in a collapsed conformation making a hydro- 
phobic interaction to the imidazole core and 2,4-phenyl groups, and 
the other benzodioxolane group in an extended conformation packing 
against the 1-butyl group from the imidazole core (Fig. 2a, b). Residues 
on the outside of TM3, TM4 and TM5 form an extensive hydrophobic 
pocket with shape complementarity for NDT9513727. Ile124*"°, 
Leu125**! and Ala1283*4 on TM3 and Leu209°*°, Pro214>°° and 
Thr217°*? on TM5 supply hydrophobic interactions to the collapsed 
benzodioxolane group and 2-phenyl ring. The extended benzodioxolane 
group and 1-butyl component of the ligand sits in a hydrophobic 
pocket formed between TM3 and TM4 consisting of Leu1253#!, 
Ala128°“4, Thr129°*,Val159**, Alal60**, Leu156* and Leul63*°? 
(Fig. 2b, c). The extra-helical binding site in C5aR1 between TM3, TM4 
and TM5 is distinct from both the more conventional orthosteric sites 
for GPCR ligands, and other negative allosteric sites identified outside 
the transmembrane helical bundle (Fig. 1b). For example, N-[2-[2- 
(1,1-dimethylethyl)phenoxy]-3-pyridinyl] -N’-[4-(trifluoromethoxy) 
phenyl]urea (BPTU) is bound to the outside of TM3 on the P2Y; 
receptor, the site for AZ3451 on PAR? is centred on a different region 
of TM3”, and MK-0893 forms a clamp around the outside of TM6 
on the glucagon receptor’, Interestingly, however, the NDT9513727- 
binding site in C5aR1 is analogous to the site of the extra-helical 
full allosteric agonist AP8 (ago-PAM), which was solved recently in 
complex with the free fatty acid receptor GPR40"*. 

The crucial interaction between C5aR1 and NDT9513727 is a single 
hydrogen bond supplied by the imidazole core of the ligand to the 
indole ring of Trp213°? (Fig. 2c). This residue has been previously 
reported to confer species selectivity for W-54011 and NDT9520492"°, 
both of which are chemically similar to NDT9513727 (Extended 
Data Fig. 2). To confirm that Trp213° is crucial for NDT9513727 
binding in human C5aR1, we mutated it to leucine (representing the 
equivalent residue in the mouse and rat C5aR1 sequence) (Extended 
Data Fig. 1), and subsequently found that this mutation abolishes 
binding of NDT9513727 to C5aR1. This was further confirmed in 
molecular dynamics simulations (Extended Data Fig. 5). In addition, 
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Figure 1 | Ribbon and schematic representation of the C5aR1 crystal 
structure asymmetric unit. a, Ribbon diagram of the C5aR1 structure, 
viewed parallel to the membrane. NDT9513727 in sphere representation 
with carbon, nitrogen and oxygen atoms coloured magenta, blue and red, 
respectively. Transmembrane helices, loops and approximate membrane 
boundaries are marked. b, Schematic cylinder representation of C5aR1 


the mutation of Ala128*4 and Thr129?*° (two residues highly con- 
served across C5aR1 orthologues that sit at the hydrophobic core of 
the binding pocket) to phenylalanine severely affected NDT9513727 
binding to C5aR1 (Fig. 2d and Extended Data Fig. 6). Notably, mutation 
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Figure 2 | The NDT9513727 ligand-binding site and mutational 
analysis. a, F, - F. OMIT density contoured at 2.5¢, calculated before 
ligand inclusion. NDT9513727 shown as sticks and coloured as in 
Fig. 1. b, Schematic of ligand interactions in the extra-helical 
NDT9513727-binding site. Boxed colour scheme follows rainbow 
colouration in Fig. 1b. Hydrogen bonds depicted as dashed red lines 
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NDT 
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monomer in rainbow colouration (N terminus, blue; C terminus, red); 
approximate positions of other solved allosteric sites are labelled. c, As in 
a, rotated 90° to view from the cytoplasmic space. d, As in c, in surface 
representation to examine the shape complementarity of the ligand- 
binding site. 


of Thr129** to leucine had no effect on NDT9513727 binding, sug- 
gesting that the bulky phenylalanine substitutions disrupt the shape 
complementarity of the hydrophobic pocket to the ligand, and that 
more conservative mutations can be tolerated across this region. Finally, 
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distances in A. c, View down the side of the receptor showing specific 
interactions of NDT9513727. d, (>H]NDT9513727 binding data for 

C5aR1 mutants across the allosteric site. Data are mean + s.e.m. and 
representative of three biologically independent experiments performed in 
duplicate. Dagger denotes ambiguous values due to near complete loss of 
specific binding. 
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mutation of two further residues flanking the extra-helical binding 
site, Thr217*°? to leucine and Leu1 25°“! to phenylalanine (positioned 
directly above, and one helical turn below Trp213°*’) had no effect 
on NDT9513727 binding, demonstrating that these residues do not 
provide a crucial contribution to the shape complementarity of C5aR1 
to NDT9513727 (Fig. 2d and Extended Data Fig. 6). 

NDT9513727 seems to antagonize receptor activation from outside 
the helical bundle by stabilizing a network of interactions that hold the 
receptor in the inactive state and by inhibiting the helical movements 
required to transition to the agonist conformation for downstream sig- 
nalling (Fig. 3). In the receptor 3)-AR!*”, activation has been proposed 
to be initiated by ligand binding causing an approximately 2 A inward 
movement of TM5 around Ser>“®. The inward movement of TM5 at 
the proline bulge of Pro*°° disrupts a network of interactions between 
Pro?, Ile?*°, Phe®*4 and Asn” that stabilize the receptor in the inac- 
tive state. Moreover, the inward movement at the top of TM5 (contri- 
buting to a contraction of the orthosteric site) is considered one of 
the structural signatures of class A receptor activation in general’®. 
The recently reported structures of GPR40 in complex with the partial 
agonist MK-8666, and in ternary complex with the full ago-PAM 
AP8 site (bound to an analogous extra-helical site between TM3, 
TM4 and TMS5) demonstrate that the positive cooperativity of these 
compounds is embedded in the ‘interlocution’ of TM4 and TM5, with 
ago-PAM binding shifting TM5 along its axis by roughly half a helical 
turn towards the extracellular side relative to TM4"*. Superposition of 
C5aR1 with both the active-state 3.-AR-G, (PDB code 3SN6) and the 
GPR40-MK-8666-AP8 (PDB code 5TZY) agonist ternary structure 
that are themselves in close agreement in terms of TM4-TM5S (Fig. 3), 
suggests that NDT9513727 acts as a ‘molecular wedge’ between TM4, 
TM3 and TM5 of C5alR. The extended benzodioxolane packs against 
TM4, the 2-phenyl group then makes packing interactions between 
TM3 and TMS involving Tle1243-4°, Ala128°*4 and Pro214°°, and the 
imidazole core then crucially hydrogen bonds to Trp213>? (Fig. 2c), 
and hinders the movement of TM5 relative to TM4 upon activation 
(Fig. 3). Indeed, molecular dynamics simulations that measure the 
inter-helical distances between TM3-TM5 and TM4-TM5 across 
the extra-helical NDT9513727-binding site over a 200-ns time course 
show that, in the absence of NDT9513727, these distances decrease 
(Supplementary Videos 1 and 2). 

A range of strategies including fluorescence resonance energy 
transfer (FRET)!’, disulfide trapping (at Cys144 in ICL2)”° and 
mutagenesis analysis”! have shown that C5aR1 dimerizes in both 
recombinant systems as well as in human neutrophils, with a TM4-TM5 
contact interface previously proposed for the C5aR17°. The two 
copies of C5aR1 present in the asymmetric unit assemble in parallel 
fashion making a network of interactions between TM4-TM4, 
TM4-TM5, ICL2 and the two copies of NDT9513727 themselves 
burying a surface area of 3,565.4 A? (1,651.5 A? without contribu- 
tion from NDT) (Fig. 1 and Extended Data Fig. 7). However, another 
crystal form of C5aR1 with NDT9513727 was also obtained in initial 
screening (yet consistently diffracted to lower resolution), revealing 
a single molecule in the asymmetric unit and signal in the differ- 
ence density for NDT9513727 with the small molecule mediating no 
crystallographic or non-crystallographic contacts (data not shown). 
Furthermore, mutation of Ile155*“4 to methionine, Val159*8 to phe- 
nylalanine and Gly162**! to phenylalanine in an attempt to disrupt 
NDT9513727 binding across the dimeric interface (Extended Data 
Fig. 7) had no effect on ligand binding in whole-cell functional 
assays (data not shown). Taken together, this provides evidence that 
binding of the small molecule is not dependent upon the parallel 
TM4-TM5-mediated assembly observed between the two copies in 
the high-resolution crystal structure. In structural terms, evidence for 
potential modes of homodimerization are available for several class A 
GPCRs including, for example, 8-AR?**?, CXCR4”4, 1-opioid”>, 
k.-opioid”® and smoothened (SMO)’. The structure of the SMO 
receptor in complex with the antitumorigenic small-molecule antagonist 
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Figure 3 | NDT9513727 extra-helical antagonism and comparison of 
C5aR1 to B:-AR-Gs and GPR40 agonist crystal structures. a, The C5aR1 
structure (orange ribbon) overlaid with the 8,-AR-Gg structure (8-AR 
coloured blue; PDB code 3SN6). NDT9513727 is in sphere representation 
coloured as in Fig. 1, viewed from a plane parallel to the membrane. 

b, As in a, isolating the receptor seven-transmembrane domains. 

c, Close-up view of the C5aR1 NDT9513727-binding site compared 

to B2-AR. d, View of the GPR40 (green; PDB code 5TZY) ago-PAM 

(AP8) (yellow) binding site compared to 8-AR. e, View of the C5aR1 
NDT9513727-binding site compared to GPR40 (green). f, Molecular 
surface representation of C5aR1 with NDT9513727 bound and the GPR40 
ago-PAM overlaid. 


LY2940680 (PDB code 4JKV) displays a TM4-TM5 contact interface 
most closely resembling that of the C5aR1 (Extended Data Fig. 7). 
Although it is tempting to speculate that the non-crystallographic 
dimer of C5aR1 reported here is physiologically relevant, the different 
crystal forms that can be obtained for C5aR1 highlight the non-trivial 
nature of deconvoluting physiologically relevant dimerization inter- 
faces from those that simply mediate crystal contacts. 

The structure of C5aR1 complexed with NDT9513727 provides 
the first, to our knowledge, detailed view of a complement compo- 
nent receptor and reveals an extra-helical negative allosteric-binding 
pocket between TM3, TM4 and TMS. The NDT9513727 ligand seems 
to act as a sterical wedge that blocks the relative movement of TM5 
(as seen in B.-AR-G, and GPR40) and thereby inhibits activation of 
C5aR1. Interestingly, although this pocket centred on Trp213°” forms 
the site for NDT9513727 and similar small molecules, previous muta- 
tion data” together with results from our competition assay (Extended 
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Data Fig. 4) suggest that cyclic peptide-based antagonists probably bind 
elsewhere, for example, in the orthosteric peptide-binding site. The 
C5aR1-NDT9513727 structure provides an example of an extra-helical 
binding site within the lipid bilayer that can be targeted to negatively 
modulate receptor activity (whether this binding site is ubiquitous or of 
functional relevance across GPCRs remains to be determined), building 
a more complete picture of the means by which GPCR activity can be 
controlled allosterically. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 


No statistical methods were used to predetermine sample size. The experiments 
were not randomized and investigators were not blinded to allocation during 
experiments and outcome assessment. 

StaR generation. Full-length human C5aR1 (1-350) was used as background 
for the generation of the conformationally thermostabilized receptor using a 
mutagenesis approach described previously. Mutants were analysed for thermo- 
stability in the presence of the radioligand [*H]NDT9513727. The C5aR1 StaR is 
the full-length receptor with 11 thermostabilizing mutations. 

Cell culture. HEK293T cells were purchased from the American Type Culture 
Collection and were cultured in DMEM supplemented with 10% (v/v) fetal bovine 
serum (FBS). Cells were transfected using GeneJuice (Merck Millipore) according 
to the manufacturer's instructions and collected after 48h. Cell lines were not tested 
for mycoplasma contamination. 

Thermostability measurement. Transiently transfected HEK293T cells were 
incubated in 50 mM HEPES-NaOH pH 7.5, 150mM NaCl, supplemented with 
cOmplete Protease Inhibitor Cocktail tablets (Roche), with 1% (w/v) n-dodecyl- 
B-p-maltopyranoside (DDM) at 4°C for 1h. All subsequent steps were performed 
at 4°C. Samples were incubated with 200nM [?H]NDT9513727 for 1h and crude 
lysates cleared by centrifugation at 16,000g for 15 min. Thermostability of the 
receptor was determined as previously described. Thermal stability (Ty) is defined 
as the temperature at which 50% ligand binding is retained. 

Radioligand binding. For saturation binding experiments HEK293T membranes 
transiently expressing C5aR1 or C5aR1(W213L) were incubated with varying con- 
centrations of [7>H]NDT9513727 (final assay concentration of 0-1,000nM, assay 
buffer: 50 mM HEPES-NaOH pH 7.5, 150mM NaCl, 1% (w/v) DDM). For com- 
petition binding experiments, HEK293T membranes transiently expressing C5aR1 
were incubated with 200nM [7H]NDT9513727 and varying concentrations of cold 
ligands NDT9513727, W-54011, PMX-53 or C5a agonist (final assay concentra- 
tion of 0-10,.M, assay buffer: 50 mM HEPES-NaOH pH 7.5, 150mM NaCl, 1% 
(w/v) DDM). Binding assays were incubated for 2h at 4°C and the reactions were 
terminated by ligand separation via immobilized metal ion affinity chromatogra- 
phy (IMAC). Specific binding was determined by subtracting mock-transfected 
controls. Saturation and competition binding data were globally fitted to one 
site-specific binding or one site heterologous competition. 

Membrane preparation. cDNA encoding the human C5a receptor or C5a StaR 
construct was transfected into HEK293T cells using the transfection reagent 
Genejuice (Novagen). Forty-eight hours after transfection, cells were collected 
and washed twice with ice-cold PBS. The pellet was resuspended in ice-cold 
buffer containing 20mM Tris-HCl, pH 7.4, 1mM EDTA and homogenized with 
an Ultraturax for 30s at maximum speed. After centrifugation at 48,000g for 30 min 
at 4°C, the pellet was resuspended and spun again. The final pellet was resuspended 
and frozen at —80°C before use. Protein was determined using the BCA protein 
assay method. 

125]_C5a radioligand-binding assay. After thawing, membrane homogenates were 
re-suspended in the binding buffer (50 mM HEPES, pH 7.4, 1mM CaCh, 0.5% 
bovine serum albumin) to a final assay concentration of 51g (wild type) or 201g 
(StaR) protein per well. Competition experiments were carried out using 25-30 pM 
of °]-C5a (in a total reaction volume of 250,11) for 120 min on ice. At the end of 
the incubation, membranes were filtered onto a unifilter, a 96-well white microplate 
with bonded GF/C filter pre-incubated with 0.5% polyethylenimine, with a Tomtec 
cell harvester and washed five times with PBS. Non-specific binding was measured 
in the presence of 10,1M C5a. Radioactivity on the filter was counted (1 min) on 
a microbeta counter after the addition of 50 11 of scintillation fluid. Half-maximal 
inhibitory concentration (ICso) values were determined using Prism. 
Truncation constructs. A panel of N- and C-terminal truncation variants of 
C5aR1 were designed and tested on the basis of secondary structure prediction. 
The most suitable construct emerging from this screen comprised residues 29-333 
with the inclusion of a three-alanine spacer and a C-terminal deca-histidine tag. 
Expression, membrane preparation and protein purification. The truncated 
C5aR1 StaR(29-333) construct was expressed with a C-terminal deca-histidine tag 
in Spodoptera frugiperda Sf21 cells (Oxford Expression Technologies) using ESF 921 
medium (Expression Systems) supplemented with 10% (v/v) heat-inactivated FBS 
(Sigma-Aldrich) and 1% (v/v) penicillin/streptomycin (PAA Laboratories) with 
a Bac-to-Bac Expression System (Invitrogen). Cells were infected at a density of 
2.5 x 10° cells ml“! with baculovirus at an approximate multiplicity of infection of 
2. Cultures were grown at 27 °C with constant shaking and collected by centrifuga- 
tion 48 h after infection. All subsequent steps were performed at 4°C unless other- 
wise stated. Membranes were prepared by resuspension of cells in buffer containing 
50mM HEPES pH 7.5, 150mM NaCl, supplemented with cOmplete Protease 
Inhibitor Cocktail tablets (Roche), followed by disruption using a microfluidizer 
at 60,000 pounds per square inch (M-110L Pneumatic, Microfluidics). Membranes 
were collected by ultracentrifugation at 258,420g, resuspended in 50 mM 
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HEPES-NaOH pH 7.5, 150mM NaCl with cOmplete Protease Inhibitor Cocktail 
tablets (Roche), and stored at — 80°C until use. To purify the receptor, membranes 
were thawed at room temperature and incubated with 51M NDT9513727 for 
30 min before solubilization with 1.2% (w/v) (DDM) and 0.12% (w/v) CHS (cho- 
lesteryl hemisuccinate; Anatrace, CH210) for 1h. Insoluble material was removed 
by ultracentrifugation at 298,834g and the receptors were immobilized by batch 
binding to a 5 ml HiTrap TALON crude cartridge (GE Healthcare, 28-9537-67) 
connected to an AKTA FPLC system pre-equilibrated in buffer A: 50 mM 
HEPES pH 7.5, 150mM NaCl, 0.03% (w/v) DDM, 0.003% (w/v) CHS and 2,.M 
NDT9513727. The bound material was eluted in buffer containing 300 mM imi- 
dazole. The protein was then concentrated using a 15 ml 100 kDa cut off Vivaspin 
Turbo Polyethersulfone (PES) concentrator (Sartorius, VS15T42) centrifuged at 
932g for 3 min cycles at 4°C in a Beckman Coulter Allegra X12-R centrifuge fitted 
with a swinging bucket SX4750A ARIES rotor and subjected to preparative SEC 
in 50mM HEPES-NaOH pH 7.5, 150mM NaCl, 0.12% (w/v) DDM, and 0.012% 
(w/v) CHS on a Superdex 200 10/300 Increase column (GE Healthcare). Receptor 
purity was analysed by SDS-PAGE. Fractions containing the pure, monomeric 
receptor were concentrated to 18-23 mg ml! in a 0.5 ml 100-kDa cut off Vivaspin 
Polyethersulfone (PES) concentrator (Sartorius, VS0142). Protein concentration 
was determined using the calculated extinction coefficient of the receptor at 
280 nm (£280, calc = 56,225 M~! cm~!) and confirmed by quantitative amino 
acid analysis. 

Crystallization. Non-fusion C5aR1 StaR(29-333) was crystallized in LCP at 
22.5°C. The concentrated protein (~20 mg ml!) was mixed with monoolein 
(Nu-Check) supplemented with 10% (w/w) cholesterol (Sigma Aldrich) and 501M 
NDT9513727 using the twin-syringe method”’. The final protein:lipid ratio was 
40:60 (w/w). A 50 nl bolus was dispensed on 96-well glass bases and overlaid with 
750 nl precipitant solution using a Mosquito LCP from TTPLabtech and sealed 
with Laminex Film Covers (Molecular Dimensions). 20-50 1m elongated plate 
shaped crystals of C5aR1 StaR were grown in 100 mM tri-sodium citrate across 
a pH range of 5.5-6.0, 200 mM Na/K tartrate, 35-45% (v/v) polyethylene glycol 
400, and 0.214.M NDT9513727. 

Diffraction data collection and processing. Single crystals were mounted for 
data collection, flash-frozen and stored in liquid nitrogen without addition of 
cryoprotectant. Diffraction data from 11 crystals, collected at Diamond Light 
Source beamline 124, were merged to assemble a 99.3% complete dataset to a final 
resolution of 2.7 A. X-ray diffraction data were measured on a Pilatus 6M detector 
at Diamond Light Source beamline 124 using a beam size of 10j1m x 10j1m. 
Crystals displayed diffraction initially out to approximately 2.5 A following 
exposure to a non-attenuated beam for 0.07 s per 0.25 degree of oscillation. It was 
possible to collect ~25° of useful data from each crystal before radiation damage 
became severe. Data from individual crystals were integrated using XDS*°. Data 
merging and scaling were carried out using the program AIMLESS"! from the 
CCP4 suite of programs**"*. Data collection statistics are reported in Extended 
Data Table 1. 

Structure solution and refinement. The structure was solved by molecular 
replacement using the program Phaser* using the |1-opioid receptor structure 
(PDB code 4DKL) as the input model searching for two copies in the asymmetric 
unit. Initial refinement was carried out with REFMAC5* using maximum-likeli- 
hood restrained refinement in combination with the jelly-body protocol. Manual 
model building was performed in Coot**. Further and final stages of refinement 
were performed with Phenix refine*” with positional, individual isotropic B-factor 
refinement and TLS. The later stages of refinement were performed with release of 
all non-crystallographic symmetry (NCS) restraints. The final refinement statistics 
(Rwork/Rfree = 20.8/23.8%) are presented in Extended Data Table 1. Structure quality 
was assessed with MolProbity™. 

Molecular dynamics simulations. Each system was pre-processed with the Protein 
Preparation Wizard method in Maestro (Maestro v.11.1, Schrédinger, New York). 
Each system was solvated and enclosed in an orthorhombic simulation box after 
embedding the complex in a pre-equilibrated POPC (1-palmitoyl-2-oleoyl-sn- 
glycero-3-phosphocholine) bilayer by means of the System Builder method as 
implemented in Maestro. OPLS 3 force field was adopted*’. Simulations were per- 
formed on GPU-equipped workstations using Desmond (D. E. Shaw Research, 
New York) and Maestro-Desmond Interoperability Tools. First, each system was 
minimized for 5,000 steps. Minimized systems were gradually thermalized up to 
300 K within the NVT ensemble. Harmonic position restraints were applied to 
solute heavy atoms (50 kcal mol”! A~?). Then, volume and density were equili- 
brated in the NPT ensemble for 200 ps at target temperature of 300 K and target 
pressure of 1 bar using a Nosé-Hoover chains thermostat"? and a Martyna-Tobias— 
Klein barostat*! with a 2.0 ps relaxation time, gradually removing residual restraints 
(10 kcal mol~! A~?) set on the protein C, carbon atoms. Production runs were 
performed in the NPT ensemble with semi-isotropic pressure coupling control 
on unconstrained systems. Short range van der Waals and Coulomb interactions 
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were cut off at 10 A. Smooth particle mesh Ewald was adopted for evaluating long- 
range electrostatic interactions (Ewald tolerance = 10°). The lengths of bonds 
involving hydrogen atoms were constrained using M-SHAKE™. A RESPA inte- 
grator (2 fs time-step, long range electrostatics calculated every 6 fs) was used to 
accumulate 250 ns of simulated time for each system“, Trajectories were analysed 
using VMD“. 

Data availability. Atomic co-ordinates and structure factors have been deposited 
in the Protein Data Bank (PDB) under accession code 509H. The data that 
support the findings of this study are available from the corresponding author 
upon reasonable request. 
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Extended Data Figure 1 | Comparison of wild-type and 
thermostabilized C5aR1 and the C5aR1 StaR crystallization construct 
in schematic representation. a, Thermal stability of ChaR1 measured 
using [7H]NDT9513727 binding after solubilization in DDM. Wild- 

type full-length C5aR1 (closed circles) has a melting temperature (Tn) 

of 18°C £1.05°C, and C5aR1 StaR full-length (open circles) has a Ty of 
44°C £0.7°C. Data are mean ts.d. from 3 independent experiments. 

b, C5aR1 StaR crystallization construct in schematic snake plot 
representation. Thermostabilizing mutations (green) are: $85A, I91A, 


1142A, N146R, L156A, F172A, R232A, A234E, L311E, S317E and N321E. 
Residues forming the NDT9513727 pocket are coloured pink. Disordered 
residues in the structure are grey. The disulfide bond between Cys109*° 
and Cys188 is denoted by a dashed yellow line. c, Multiple sequence 
alignment of human, chimpanzee, orangutan, gorilla, macaque, gerbil, 
cattle, mouse, rat and trout C5aR1 across TM5. The asterisk indicates the 
tryptophan residue at Ballesteros—Weinstein position 5.49 that is crucial 
for the interaction of the small-molecule NDT9513727 with C5aR1. 
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C5aR Wild-type 8.24+0.11 6.0 + 0.23 
C5aR StaR 9.3 + 0.09 8.1+0.21 6.1+0.19 


NDT9513727 NDT9520492 W54011 
Extended Data Figure 2 | Pharmacology of the C5aR1 StaR compared HEK293T cells transiently expressing the full-length (1-350) C5aRI1 StaR. 
to wild-type C5aR1. a, Competition assays by displacement of Data are mean + s.e.m. from three biologically independent experiments 
5T_radiolabelled C5a ('*°I-C5a), by C5a, NDT9513727 and W54011 c, Calculated pICso values. Data are representative of three independent 
applied to membranes from HEK293T cells transiently expressing full- experiments + s.e.m. d—f, 2D chemical structures of the small-molecule 


length (1-350) wild-type C5aR1. b, Competition assays by displacement of | C5aR1 antagonists NDT9513727 (d), NDT9520492 (e) and W54011 (f). 
251_C5a, by C5a, NDT9513727 and W54011 applied to membranes from 
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View along a-axis View along b-axis View along c-axis 
Extended Data Figure 3 | Crystallization of C5aR1 StaR. a, b, Typical 2.5 A after exposure to a non-attenuated beam for 0.07 s per 0.25 degrees 
C5aR1 StaR(30-333) non-fusion crystals grown in lipidic cubic phase of oscillation at beamline 124, Diamond Light Source, UK. d-f, Views of 
and complexed with NDT9513727, shown in visible light (a), and under C5aR1 StaR(30-333) packing in the monoclinic crystal system P12;1, 


polarized light (b). c, Crystals displayed diffraction out to approximately along the a (d), b (e) and c (f) axis. 
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Extended Data Figure 4 | Cold competition of wild-type C5aR1 and 
C5aR1 StaR (29-333) bound to [>H]NDT9513727. a-f, Cold competition 
of 200 nM [7H]NDT9513727 to solubilized cell lysate containing wild- 
type C5aR1 or C5aR1 StaR (29-333) with either NDT9513727, C5a 
agonist, PMX53 or W-54011. Data are representative of four independent 
experiments performed in duplicate + s.d. ICs» values inset with s.d. in 
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parentheses. The datasets for the C5a peptide and PMX53 could not be 
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analysed owing to absent competition. g, Data are mean + s.e.m. from four 
biologically independent experiments performed in duplicate. The mean 
plICso values (s.e.m. in brackets) are 6.27 (0.10), 6.75 (0.23), 6.58 (0.10) 
and 6.54 (0.19) for the competition of NDT9513727 and W-54011 against 
wild-type and then C5aR1 StaR (29-333), respectively. Dagger symbol 
denotes no value owing to lack of observed competition. 


© 2018 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. 


LETTER 


12 
& . 
= C5aR Wild-type 
a 10 typ 
n 
= 
a 8 
E 
2 
® 
| 6 
N 
N 
i 
oO 4 
io 
o 
a 
ZS 2 
0 
100 150 
Time (nanoseconds) 
b 
12 
<= C5aR W213L 
a 10 
n 
= 
Vs 8 
E 
2 
g 
| 6 
NR 
N 
KR 
oO 4 
va) 
oO 
a 
z 2 
0 . 
50 100 150 
Time (nanoseconds) 
Extended Data Figure 5 | C5aR1 and C5aR1 (W213L) molecular molecular dynamics model at time point 250 ns. b, Molecular dynamics 
dynamics simulations. a, Molecular dynamics simulation of wild-type simulation of C5aR1 (W213L) with NDT9513727 over a 200-ns time 
C5aR1 with NDT9513727 over a 250-ns time course monitoring the root course monitoring the r.m.s.d. of all NDT9513727 heavy atoms. Inset, 
mean square deviation (r.m.s.d.) of all NDT9513727 heavy atoms. Inset, molecular dynamics model at time point 200 ns. 
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Extended Data Figure 6 | Saturation binding analysis of C5aR1 mutants —_s mutant variants of C5aR1. Data are representative from three biologically 
with [*H]NDT9513727. a, Saturation binding of [,H]NDT9513727 to independent experiments performed in duplicate + s.d. Ky values are inset 
solubilized cell lysate containing wild-type C5aR1. b, Single experiment with s.d. in parentheses. The mean pK, values (s.e.m. in brackets) are 6.59 
showing fluorescence size-exclusion analysis of solubilized cell lysates (0.08), 6.42 (0.2), 6.28 (0.21) and 6.65 (0.15) for the wild-type, L125F, 
containing indicated mutant variants of C5aR1 with a C-terminal green T129L and T217L mutants, respectively. The datasets for A128F, T129F 
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Extended Data Figure 7 | Detailed view of the C5aR1 non-crystallographic view parallel to the membrane, and rotated 90° to view with cylindrical 


dimer and ligand-binding interface. a, Schematic of ligand-protomer helices from extracellular space) compared to a representative subset of 
interactions in the extra-helical NDT9513727-binding site across the crystallographic and non-crystallographic GPCR dimeric assemblies 
C5aR1 non-crystallographic dimer. Colour scheme of the boxes is as present in the PDB. 1, The C5aR1 non-crystallographic dimer reported 
in Fig. 1b. b, Close-up structural view of interactions depicted in a. here most closely resembles that previously postulated for the SMO 
c-k, Chainbow representation of the C5aR1 asymmetric unit (from a receptor. 
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Extended Data Table 1 | Data collection and refinement statistics for C5aR1 StaR 


Data collection 
Number of crystals 
Space group 
Cell dimensions 
a, b, c (A) 
a, B, y (’) 
Number of reflections measured 
Number of unique reflections 
Resolution (A) 


11 
P1211 


83.1, 51.1, 119.0 
90.0, 106.7, 90.0 
131919 
26541 
34.60 — 2.70 (2.83 — 2.70) 


Rmerge 0.195 (0.923) 
CCi2 0.987 (0.631) 
Mean I/sd(I) 7.0 (1.7) 
Completeness (%) 99.3 (99.6) 
Redundancy 5.0 (5.1) 
Refinement 

Resolution (A) 19.89 — 2.70 


Number of reflections (test set) 


26440 (1352) 


Rwork/Riree 0.2079 / 0.2385 
Number of atoms 
All 5349 
Protein 4662 
Ligand (NDT9513727) 86 
Others (Lipids, ions, waters) 601 
Average B factors (A*) 
All 46.24 
C5aR 45.02 
Ligand 28.41 
Others (Lipid, ion, water) 58.19 
RMSD 
Bond lengths (A) 0.004 
Bond angles (°) 1.035 
Ramachandran statistics 
Favored regions (%) 98.62 
Allowed regions (%) 1.38 
Outliers (%) 0.0 
MolProbity overall score (percentile) 1.33 (98") 


Values in parentheses indicate highest resolution shell. 
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ILLUSTRATION BY THE PROJECT TWINS. 


THE RESEARCH HARDWARE IN 
YOUR VIDEO-GAME SYSTEM 


Motion sensors don’t just drive gameplay. With the right software, 
they can scan dinosaur skulls, monitor glaciers and help robots to see. 


BY ANNA NOWOGRODZKI 


man with a black rectangular bar 
Ave to his chest walks a care- 
ful circuit around the skull of a 
Tyrannosaurus rex. It’s not performance art. 
The black rectangle is a motion sensor called 
Kinect, and its wearer is using it at the Field 
Museum in Chicago, Illinois, in to digitally cap- 
ture the precise 3D shape of the dinosaur’s skull. 
That’s a far cry from its developer's intended 
application. Microsoft designed it for use in 
video games, enabling Xbox users to con- 
trol their characters using movements and 


gestures rather than a handheld controller. But 
from the moment it was released, scientists 
and clinicians have been adapting the device, 
and other sensors including the Nintendo Wii 
Remote, PlayStation EyeToy and Leap Motion, 
to aid research in areas from robotics to glaci- 
ology to health care. They were quick to real- 
ize that the data the devices gather can be used 
for studies that involve measuring body move- 
ments, manipulating 3D objects or observing 
or building models of 3D spaces. 

The sensors come with a number of perks 
for scientists: they are affordable (most cost 
US$80-100), portable and compatible with 


free and easy-to-learn software. That makes 
them a nimble choice for many projects. 

But they do have significant limitations. 
Their specifications, such as resolution, tend 
to pale by comparison with industrial hard- 
ware, for instance, and the systems work better 
in living rooms than in the field. And their 
usefulness depends heavily on the type of 
research being performed. 


DINO DENTISTRY 

Denise Murmann’s experience with Kinect as 
a research tool began in 2016, when she visited 
the Field Museum with her family. While 
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> scrutinizing SUE, one of the world’s most 
complete T. rex skeletons, her nephew noticed 
an exhibit explaining that the dinosaur’s skull 
was riddled with tiny holes of unknown ori- 
gin. Were they bite marks? The vestiges of an 
infection? Murmann thought it would be fun 
to examine the skull the way she investigates 
forensic bite-mark cases in her work as a foren- 
sic dentist. 

But her usual tools just weren't up to the job. 
SUE’s skull is about 1.5 metres long and weighs 
272 kilograms — far too large for highly accu- 
rate 3D dentistry scanners. So Murmann 
turned to the Camera Culture group at the 
Massachusetts Institute of Technology's Media 
Lab in Cambridge, where imaging researcher 
Anshuman Das suggested using a Kinect con- 
nected to a laptop. The resolution would be 
about ten times less than achieved with the 
industrial scanner, Das says, but the Kinect 
could handle the specimen’s dimensions. 

So Das strapped the Kinect to his chest and 
walked slowly around the skull. The 3D scan 
revealed that not all the holes entered the skull 
at the same angle, so they probably weren't 
from a single bite. But they also tapered 
inwards, suggesting they were not the result of 
infection. The team published its findings in 
July (A. J. Das et al. PLoS ONE 12, e0179264; 
2017). Although Murmann’s project is not the 
first time that Sue’s skull has been scanned, 
the previous instance involved 500 hours in 
a computed tomography scanner normally 
used to inspect space shuttle components. 
The Kinect scan took a matter of minutes in 
the museum itself. 


GLACIERS, GAITS AND ROBOTS 
Palaeontology is not the only field to benefit 
from game controllers. Ken Mankoff, a glaci- 
ologist with the Geological Survey of Denmark 
and Greenland, has used the Kinect to model 
glacier beds and the meltwater channels 
underneath them at 1-millimetre resolution. 
Such data can help glaciologists better under- 
stand how glacial melt influences sea levels. 
Usually, the data are collected using a LIDAR 
(light detection and ranging) system, Mankoff 
says, which can cost upwards of $10,000. 

Off-the-shelf video-game motion sensors 
also make convenient vision systems for robots. 
Robotics researchers Ashutosh Saxena of Stan- 
ford University in California and Chenxia Wu, 
then at Cornell University in Ithaca, New York, 
turned to the Kinect to design a robot that could 
learn a task just from ‘watching’ people. Their 
WatchBot comprises a computer and a laser 
pointer with a Kinect mounted on a tripod as 
its ‘eyes. WatchBot was able to learn what steps 
constituted a task, such as fetching food from 
an oven, well enough to identify a missed step 
60% of the time — sufficiently accurate to give 
it potential applications in manufacturing and 
safety monitoring. 

Other video-game sensors have proved useful 
in research as well. The controller made by Leap 


Motion in San Francisco, California, is designed 
to track fine hand and finger movements, and 
virtual-reality headsets such as the Daydream 
(by Google in Mountain View, California; 
about $80) and Rift (by Oculus VR in Menlo 
Park, California; $400-500) provide more 
immersive experiences. Hydrologist Willem 
Luxemburg at Delft University of Technology 
in the Netherlands used the Wii Remote to 
measure reservoir evaporation rates to bet- 
ter than millimetre accuracy. (The Wii is no 
longer in production, but used systems are avail- 
able online, as is the case for the Kinect, which 
Microsoft stopped manufacturing in October. 
Microsoft's newer HoloLens, augmented-reality 
glasses that are in limited production as their 
development continues, uses the same core 
sensor that powered Kinect.) 

Video-game sensors are also increasingly 
used in health care. Marjorie Skubic, an engi- 
neer at the University of Missouri in Columbia, 
began using the Kinect as soon as it was 
released in 2010 as a way to monitor seniors’ 
gait and predict their risk of falling. “It was 
right before Christmas,’ she recalls. “We went 
around town and bought them all up. I'm afraid 
we might have broken some kids’ hearts.” The 
Kinect was a major improvement on her team’s 
previous monitoring system: a webcam anda 
large desktop computer, she says. The computer 
hogged space and generated so much heat that 
it required noisy fans, which felt intrusive. The 
Kinect eliminated both these issues, requir- 
ing a much smaller computer while accurately 
capturing seniors’ silhouettes as they moved. 


KINECT THE DOTS 

To capture objects in 3D, the Kinect takes a 
digital image just as an ordinary digital camera 
does, but also measures depth using infrared 
light. It then combines these two data sets to 
create a ‘depth image’ in which each pixel of the 
image is mapped relative to its distance from the 
sensor. From there, the system can create a 3D 
model or reconstruct a skeletal representation. 

Little expertise or equipment is required 
to exploit those data. All that’s needed is 
an adapter (available online for about $50) 
that links the Kinect to a laptop, plus a good 
graphical processing unit to handle the Kinect's 
real-time 3D constructions, Das says. “Some of 
these gaming laptops are perfect.” 

For those interested in playing with the 
platform, a large hacker community is ready 
to help. Microsoft also makes a software devel- 
opment kit that can be used to build custom 
applications that use Kinect data, and 3D Scan, 
a software package for object scanning, can be 
downloaded from the Microsoft app store. 
Skubic’s team started using the Kinect before 
either of these were available, so the research- 
ers used an open-source programming library 
called libfreenect from the OpenKinect project. 

Tiffany Tang, a researcher at Wenzhou- 
Kean University in China, developed a 
Kinect-based system to help people to read 
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the emotions of children with autism. She 
has found the software — in her team’s case, 
Microsoft's Kinect software development kit 
and Visual Studio — easy to get to grips with. 
“My student just learned this on his own ina 
week,’ she says. 

That ease of adoption can come in handy, 
because researchers may need to change 
platforms to keep up with developments in 
the fast-paced gaming industry. At Ulster 
University near Belfast, UK, rehabilitation 
researcher Suzanne McDonough and com- 
puter scientist Darryl Charles pair video-game 
sensors with custom software to monitor 
patients’ physical-therapy exercises at home 
and assign new ones as they progress. Over the 
years, McDonough and Charles have migrated 
from the EyeToy and Wii to webcams built for 
virtual-reality games, then through two ver- 
sions of the Kinect to track arm and hand 
movements, and finally to virtual-reality head- 
sets from Oculus and Google to provide a more 
immersive experience. They also use the Leap 
Motion sensor. “It’s very good at being able to 
recognize gestures and natural movements of 
the hand,’ says Charles. 

These tools do have substantial limitations, 
however. One issue with the Kinect is distance: 
because it was designed for living rooms, it 
can measure only a few metres from the sen- 
sor, Mankoff says. New algorithms, including 
Kinituous and ElasticFusion, allow research- 
ers to ‘stitch’ data together and overcome that 
limitation, but other hurdles remain, espe- 
cially when it comes to fieldwork. “Anything 
wet is a problem. Direct sunlight is a problem,’ 
Mankoff says. “Fortunately my work is in caves, 
but if it weren't I would have to work at night 
or on very cloudy days.” Other issues include 
battery life and difficulty tracking people with 
unusual postures or loose clothing. 

And yet, scientists continue to find creative 
uses for the sensors. Since Das published the 
T. rex results, he has received multiple requests 
from the museum and palaeontology com- 
munities to use or adapt his scanner to analyse 
other fossils, art and artefacts. The tool is so 
simple that he has used it for a face-scanning 
exercise at a primary school in New Hampshire, 
where he volunteers. “You're not going to be 
matching an industrial scanner, but since 
it’s so cheap and it’s easy to share data, it will 
encourage collaboration,’ Das says. m 


Anna Nowogrodzki is a science writer based 
in Boston, Massachusetts. 


CLARIFICATION 

The Toolbox ‘Need a paper? Get a plug-in’ 
(Nature 551, 399-400; 2017) stated 

that Eric Archambault is an independent 
bibliometrician. He is actually chief 
executive of Science-Metrix in Montreal, 
Canada. 
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Biologist Stephani Gordon turned to freelance film-making to capture nature and science research on camera. 


Science on the screen 


Film-making offers scientists the chance to transform research into stunning visuals. 


BY ROBERTA KWOK 


Gulf of California, a nineteenth-century 

whaling boat in the northwestern Hawai- 
ian Islands and a search for Amelia Earhart’s 
plane in the central Pacific. In 2017, she shot 
footage off the coast of Mexico of pelagic crea- 
tures such as the paper nautilus (Argonauta 
nouryi) and vampire jellyfish (Vampyrocros- 
sota childressi). 

Gordon, sole proprietor of Open Boat Films 
in Portland, Oregon, spent more than a decade 
working as a field biologist, studying seabirds, 
sharks and other marine animals. But from 2004 
to 2005, while working as a marine-ecosystem 
research specialist at the US National Oceanic 
and Atmospheric Administration (NOAA) in 
Honolulu, Hawaii, she served as a field guide for 


Seu Gordon has filmed squid in the 


two nature photographers and was impressed by 
the large audience their images drew. 

In December 2005, she learnt from a friend 
about a graduate programme in science and 
natural-history film-making at Montana State 
University in Bozeman, intended for students 
who have science, engineering or technology 
backgrounds. “A huge lightbulb went off” after 
reading the programme description, Gordon 
remembers. She recalls thinking: “This is what 
Ineed to do. This fits me” 

Gordon had no experience making videos, 
but she had taken photographs for her univer- 
sity’s student newspaper, 
and had once consid- 
ered photojournalism 
as a career. And she had 
always thought that sci- 
ence TV programmes 


For accompanying 
videos, see the 
online story at 


should include more field-research details. 

Gordon wona place on the course and learnt 
how to write scripts, direct a production and edit 
video. She began freelancing during her studies 
and continued full-time after graduation, work- 
ing for clients such as National Geographic, the 
BBC, PBS and NOAA. She has no regrets about 
leaving research behind, even though freelance 
film-making presents challenges, among them 
an unstable income. “It just feels like the right 
medium for me,’ she says. 

Video offers researchers a dynamic way 
to communicate scientific concepts, ranging 
from the way microscopic algae tumble through 
water to dancers enacting Brownian motion. 
With the rise of mobile devices and a gen- 
eration that expects online visual content, the 
demand for videos is booming. YouTube boasts 
more than 1 billion users, who collectively 
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> watch about 1 billion hours of video per day, 
according to the company’s website. “There is 
just a hunger for visual media,’ says Dennis Aig, 
programme director of the science film-making 
course at Montana State University. 

Science videos generally aren't as popular 
as, say, gaming or music clips. But there is a 
demand for them, Aig says, because they can 
show research in remote or unusual places, and 
explain difficult concepts more clearly. 

It’s easier now for interested researchers 
to learn the requisite skills and produce con- 
tent, thanks to science-film-making graduate 
courses; general science-communication pro- 
grammes that offer video courses; and short- 
term training workshops. Expensive equipment 
is unnecessary — often, an iPhone and basic 
accessories will do. Some scientists make out- 
reach videos on the side, whereas others become 
full-time freelance film-makers, educational- 
content creators or staff members at production 
companies or non-profit organizations. 

Projects such as TV nature documentaries are 
highly competitive, with limited staff jobs and 
many freelancers trying to break in. And the 
work can be strenuous: hauling heavy gear and 
filming in cold, wet weather are often part of the 
job. But when the right images, sound and dia- 
logue come together, it is magical, says Charlotte 
Salvatico, a freelance film-maker, teacher and 
consultant. She is the Paris coordinator of Imag- 
ine Science Films (ISF), a non-profit based in 
New York City that runs science-film festivals 
and encourages connections between scientists 
and film-makers. (Nature sponsored awards at 
the Imagine Science Film Festival for several 
years, ending in 2016.) 


SCIENTIFIC FINESSE 

Gordon says that her field research prepared 
her to shoot nature documentaries. As a 
marine biologist, she performed delicate tasks 
such as collecting coral eggs with a syringe; 
jobs such as filming underwater, for example, 
require the same fine control over instruments. 
In the field, she grew accustomed to working 
in remote locations and fixing equipment, and 
knew how to avoid disturbing wildlife — skills 
that she uses constantly today. She says that the 
process behind her current work is similar to 
field research: collect observations, shape them 
into a story and distribute the product. “To me, 
science film-making feels totally equivalent to 
being a field biologist,” she says. 

Aig estimates that roughly 1,000-2,000 peo- 
ple with science backgrounds are making films 
professionally in the United States and Europe. 
In 2016, ISF launched an online database of 
science movies called Labocine, which now 
contains more than 2,000 titles, ranging from 
documentaries to avant-garde films; of those, 
about one-fifth were made by scientists, says 
Nate Dorr, director of programming at ISE. For 
example, the experimental film The Mirror Sys- 
tem depicts a woman who dreams of memo- 
ries while exploring a neuron ‘forest’ It was 


directed by Eva Zornio, an independent film- 
maker with a neuroscience background and 
based in Geneva, Switzerland. Some research 
organizations are pushing the medium as well. 
Celldance, a programme run by the American 
Society for Cell Biology in Bethesda, Mary- 
land, provides US$1,000 grants for scientists 
to produce videos about their research. 
Science-film-making graduate programmes 
offer a structured route into the industry, and 
video experience may not be necessary to apply. 
“We assume they don’t know anything,” Aig says 
of students in his programme. Similarly, the sci- 
ence and natural-history film-making gradu- 
ate programme at the University of Otago in 
Dunedin, New Zealand, looks for applicants 
with portfolios demonstrating a creative spark, 
but another medium such as photography 
or drawing is acceptable, says Lloyd Spencer 
Davis, founder of the university’s Centre for 
Science Communication. Other wildlife or 
environmental-film-making programmes are 
offered at the American University in Washing- 
ton DC, the University of Salford, UK, and the 
University of the West of England in Bristol, UK. 
Researchers can also enter science journalism or 
communication programmes that include video 


PHONE SKILLS 
Shortcuts to filming 


Researchers can try out film-making 
without expensive equipment. “You don’t 
need anything fancier than your phone,” 
says Rob Nelson, director of Untamed 
Science, a non-profit in Charlotte, North 
Carolina, that makes science videos. 
And Apple’s free editing software iMovie 
is generally sufficient for beginners. 
Film-making tutorials are available on 
the sites Vimeo Video School, Lynda. 
com, Khan Academy and Untamed 
Science’s YouTube channel Rob & Jonas’ 
Filmmaking Tips. 

Extra equipment might be necessary 
to gather clear audio. Viewers can 
tolerate shaky video, but they will stop 
watching if it’s hard to hear the person 
speaking, says Huw James, founder 
of Anturus, an adventure-education 
company in Cardiff, UK, that produces 
science videos. When interviewing 
someone on camera, the film-maker 
should record audio using a separate 
phone or microphone positioned close 
to the person. When outdoors, a lavalier 
microphone with a wind shield is 
essential, James says. 

Amateurs should be prepared to 
improve through trial and error. “The 
most crucial thing is being okay with 
failing quite a lot,’ James says. “They’re 
not going to look great straight away.” B.K. 
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coursework, such as those at Imperial College 
London or Boston University in Massachusetts. 

If graduate school is not an option, research- 
ers can seek unpaid internships on film pro- 
ductions to learn the ropes. For instance, a 
cephalopod researcher could assist on an octo- 
pus documentary by sharing knowledge about 
the creatures’ habitat. Researchers can ask 
industry contacts for mentor suggestions, or 
attend film festivals. During the first year of her 
film-making programme, Gordon approached 
John Brooks, an independent director of pho- 
tography and underwater cinematographer, at 
the Jackson Hole Film Festival in Wyoming, and 
offered to be his dive assistant. Partly because 
she was certified as a NOAA diver, he agreed to 
let her join a film project. 

Scientists can also pick up video skills at 
short workshops or university classes. Science- 
Film in Bowen Island, Canada, for exam- 
ple, holds workshops of 3-12 days to train 
researchers and other professionals to make 
videos. Students are not expected to become 
full-time filmmakers but to use video as a tool, 
says Colin Bates, the company’s co-founder 
and an ecologist at Quest University Canada 
in Squamish. For instance, a researcher could 
create a video of a field or lab technique for 
a conference presentation to help explain the 
method, he says. The training could also help 
scientists to satisfy outreach requirements 
in grant applications, or to produce video 
abstracts for papers. Les Chercheurs Font Leur 
Cinéma (Researchers Make Their Movies), a 
programme run by Doc’Up, a doctoral-student 
association in Paris, helps PhD students in the 
Tle-de-France area to make five-minute movies 
about their research. 

Creating videos allows scientists to better 
communicate their research to peers and the 
public, says Sally Warring, a protistologist at 
the American Museum of Natural History in 
New York City, who films microbes. She recalls 
a video made by a team at Harvard Medical 
School in Boston, Massachusetts, and the 
Technion-Israel Institute of Technology in 
Haifa showing the growth of antibiotic-resist- 
ant bacteria (go.nature.com/2bd0xjx), which 
she found more powerful than a graph. And 
if scientists are issuing a press release about a 
study, a companion video might pique journal- 
ists’ interest, she says. 

Scientists can dabble using basic equipment 
(see ‘Shortcuts to filming’). Some people start by 
creating short videos for social media. In 2015, 
Warring began filming pond microbes under 
the microscope with her iPhone. Her simple 
videos captured processes such as green algae 
producing a colony. Warring posted them on 
her Instagram account @pondlife_pondlife, 
which now has more than 48,000 followers. 

YouTube allows scientists to explain more- 
complex concepts. But videos should still be 
fast-paced and energetic, because users are eas- 
ily distracted, says Dianna Cowern, who created 
the YouTube channel Physics Girl, now funded 


STEVE TING 


by PBS Digital Studios. “They can click away at 
any moment,’ she says. She suggests avoiding 
standard classroom topics in favour of unusual 
phenomena — for instance, how sand behaves 
like a fluid when air bubbles through it. 

For more-ambitious projects, scientists can 
recruit a crew through social media or friends. 
Warring won funding from ISF to produce a 
film about lichen; for her six-minute docu- 
mentary, she recruited film-making friends 
whom she'd met 


through graduate “To me, science 
school or Instagram. film-making 

To make a short film ‘feels equivalent 
during her neuro- “45 being afield 


science PhD pro- 
gramme, Salvatico 
got a student project 
grant from the organization Paris Sciences and 
Letters. A friend at film school introduced her 
to other students, who became crew members, 
and Salvatico recruited dancers by e-mailing a 
conservatory’s student office. 

But video is time-consuming to produce. 
Each Instagram video takes Warring several 
hours, and Cowern’s YouTube videos each 
require about 3-7 days of work. A short docu- 
mentary can span months. 

And although some projects bring in 
income, returns are typically modest. War- 
ring earns thousands of dollars per year from 
licensing her photos and videos, and related 
projects such as creating an exhibition for the 
Brooklyn Botanic Garden in New York City. 
Before PBS Digital Studios started supporting 
Physics Girl, Cowern had about 125,000 sub- 
scribers and averaged around $500 -$1,000 
per month in ad revenue. The network noticed 
her videos and invited her to join them in 
2015; she now works full-time on Physics Girl. 
But reaching a point where YouTube-channel 
income can supporta creator full-time is chal- 
lenging, she cautions. Cowern produced about 
35 videos over 3 years before joining PBS. 

Aig estimates that for staff film-makers at 


biologist.” 
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Students on a science film-making course in New Zealand put their skills to the test. 


production companies, annual salaries are 
around $30,000-40,000 for entry-level posi- 
tions and $75,000-80,000 for middle manag- 
ers. Top independent film-makers can make 
hundreds of thousands of dollars per year, 
but such cases are atypical. Gordon says that 
her net income is about 60-70% of what she 
earned as a scientist. 

And film-making is not a cushy gig. “It is as 
hard as research, maybe even harder, to fully 
pull off? Gordon says. Her 2017 expedition off 
the coast of Mexico hit a snag when the team 
had to switch research vessels, and the new 
boat lacked the equipment to support dives 
for underwater filming. Gordon assembled 
an in-water studio — custom-made aquari- 
ums, lighting and other components — to film 
animals brought on board instead. Because the 
species required cold water, she had to workin 
a walk-in fridge that blew freezing air on her 
while 5-metre swells buffeted the ship. “It was 
miserable,” she says. 

But researchers drawn to the medium can 
start small — say, with a quick video of field- 
work. “Don't overthink it? says Rob Nelson, 
director of Untamed Science, a non-profit in 
Charlotte, North Carolina, that makes sci- 
ence videos. “Just grab a camera.” m 


Roberta Kwok is a freelance writer in 
Kirkland, Washington. 


CORRECTION 

The Turning Point ‘Gourmet investigator’ 
(Nature 551, 403; 2017) erroneously 
stated that Vayu Maini Rekdal’s mother 
was born in Kenya. In fact, she was born in 
Sweden. 

The Careers Feature ‘Super catalysts’ 
(Nature 552, 139-140; 2017) misspelled 
the colloquial term for Margarita 
Salas’s trainees: they are ‘Icfonians’, not 
‘Infonians’. 
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Gender perspectives 


Female co-authorship increases the 
likelihood that a medical-research paper 
will address gender-related differences 

in disease or treatment outcomes, a 

study in Nature Human Behaviour 

finds (M. W. Nielsen et al. Nature Hum. 
Behav. 1, 791-796; 2017). Neglecting 

these disparities — which affect 

health outcomes in conditions such as 
cardiovascular disease and osteoporosis 

— can have life-threatening consequences, 
the study adds. The authors analysed more 
than 1.5 million medical-research papers 
published between 2008 and 2015. They 
found that the research was most likely to 
address gender differences when female 
scientists were first and last authors. 
However, female researchers comprised 
only 40% of first authors and 27% of last 
authors in the papers analysed. This is 
troubling, the study authors say, because 
last authors usually lead on identifying, 
planning and developing research pursuits 
in health disciplines. Increasing numbers 
of medical researchers, journal editors and 
science agencies already acknowledge the 
importance of including gender analysis in 
research, the authors note. 


Tools for post-PhD life 


US graduate programmes are starting to 
formalize expectations for the skills and 
competencies that PhD students should 
have by the end of their studies, finds a 
report from the US Council of Graduate 
Schools (CGS) in Washington DC (see 
go.nature.com/2aab3gg). In a 2016 survey 
of its 241 member institutions, the CGS 
found that 65% of those responding 
reported that all or most of their doctoral 
programmes had developed formal ways 
to assess whether students are learning 
specific skills that are relevant to the 
workplace. The US academic community 
has long been considering how to address 
the fact that holders of science PhDs 
typically have not learned what they need 
for non-academic careers (see Nature 543, 
277; 2017). Employers outside academia 
want candidates with transferable skills 
(see go.nature.com/2m3fkfa), including 
experience in data science and big data; 
science policy; governance, risk and 
compliance; and time, project and budget 
management. The report recommends that 
universities work with employers to find 
out what they look for in job candidates. 
Universities in Australia, Canada and 
Europe have developed similar graduate- 
programme assessment metrics. 
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UNIVERSAL PARKING, INC. 


BY JAMES ANDERSON 


sa physics graduate student at a great 
Az university, my life was full. 

One of the things it was most full of 
was trying to find a parking space. I was not 
important enough to get a university sticker 
and not rich enough to afford a pay lot. 

One day, I had even less luck finding a 
space than usual. As I walked the many blocks 
back to my lab, I passed a new store front with 
an enticing sign in the window: “Universal 
Parking, Inc. — Affordable Parking”. 

Sitting behind the counter was my old 
buddy, Alfred, who had left our graduate 
programme the previous year after a noisy 
altercation with his adviser. 

I stuck my head through the door. 
“What’s happening, Al?” Iasked. 

“T joined the private sector,’ he replied. 
“This is my new start-up.” 

“How can you provide cheap parking so 
near the university?” 

“Youd be surprised,” he said with a grin. 
“Our lab studies quantum computers, right? 
And quantum computers need error correc- 
tion to get reliable results. So I wanted to see 
if the errors had structure. 

“Guess what, they did. Decoded, the 
errors formed a message — in fact, they 
formed an advertisement. In English. It 
said: ‘Need money? Make big bucks solving 
the perennial urban problem: parking. Buy 
a parking franchise from Universal Park- 
ing, Inc. Not sold in stores’ It gave instruc- 
tions on how to build a device to contact the 
advertiser. So, of course, I told my adviser. 
And, naturally, he thought I was nuts and 
kicked me out? He sighed. 

“With my graduate career ended, I had no 
income. So I did the only thing I could do: I 
put together the contact device using a 1950s 
short-wave radio and a vacuum-tube stereo 
amplifier. Apparently, vacuum tubes are crit- 
ical. And here we are,’ he waved his hands. 

“Their parking technology is based on the 
many-worlds model of quantum mechanics. 
You park your car in an alternative universe. 
They provide the hardware and software to 
park the cars and get them back. You drive 
the car onto the transmitting pad, press a 
button and the car goes away. Press another 
button and it comes back.” 

Iwas speechless. Al 


> NATURE.COM had been in business 
Follow Futures: for only amonth and 
Y @NatureFutures was already making 
EG go.nature.com/mtoodm == money hand over fist. 


Driving a hard bargain. 


Hed even visited the parking lot. In a 
parallel universe. How could I resist? 

“Any chance youd take an old lab mate to 
see this lot?” I asked. After all, the scheme 
had to be using some radical new physics. 

“Why not?” Al shrugged, and gestured 
at the nearest car. We clambered in and he 


pressed a button on his remote. Immediately, 
we found ourselves in the middle of a vast 
flat area. The lot stretched away in every 
direction, neatly marked into spaces. Every 
few yards stood a sign with a cartoon draw- 
ing of an ape. 

A huge number of cars were already 
parked. Their shapes were similar to those 
at home but they had unfamiliar names. We 
saw a sporty Chrisler Baalrog and a pink, 
jacked up pick-up with massive wheels, a 
GMD Epicene. There was a Férd Pantocra- 
tor sedan anda Fard Oriflamme coupé. Two 
of the most striking vehicles were a sedan 
with three headlights, the Archimandrite 
Trinitarian, and a four-axle, eight-wheeled 
VWW Octopus. 

Occasionally, with a pop, a new car 
appeared. Sometimes a car vanished with a 
loud sucking noise. 

A faint rasping noise reached our ears. In 
the distance, we could make out an entity 
riding towards us in what looked like a melted 
golf cart. As it got closer, we could see that the 
creature was tall, thin, approximately human, 
and dressed in a glowing cerise uniform. 

He pulled up next to us. A box at his belt 
spoke in oddly accented English. “Welcome 
to our endeavour. I am joyful you found us. As 
new worlds join Universal Parking, we dedi- 
cate to them a portion of our parking facilities. 
We recently opened a section for primitive 


122 | NATURE | VOL 553 | 4 JANUARY 2018 
© 2018 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. 


primates such as yourselves. I am the parking 
attendant. I have arrived to collect our fee” 

He read our itemized bill. “One, initial 
franchise fee. Two, cost per car storage. Three, 
cost per car retrieval. Four, fee for getting lost 
in the parking lot. Five, fee for getting found. 
Six, fee for talking to parking-lot staff” 

He removed a page from a notebook 
and handed it to Al. “We expect your pay- 
ment in bitcoins by next week. The modest 

amount you owe is roughly equivalent to 
the income of one of your local adminis- 
trative units such asa state. 
“Tf you can't find enough bitcoins I sug- 
gest arranging for the indentured servitude 
of a few hundred of your best hackers. We 
always need to upgrade our software and 
expand our operations and possibly cause our 
competitors operational difficulties. 

“We expect prompt payment. Otherwise 
we will engage our collection agent whom I 
suspect you will find it difficult to like” 

He was about to leave when a second 
entity appeared, seemingly out of nowhere. 
This one was short, pudgy and vivid orange, 
including his jumpsuit, eyes and skin. 

He glared at the attendant. “Stop bother- 
ing these primitives. We have warned you 
before. Next time, we will impose financial 
encumbrances with appropriate chastise- 
ment. Go back to your educational cluster. 
Return no more” 

The parking attendant and cart vanished 
with a sucking sound. 

The newcomer turned to us. “Be thankful 
we saved you from a low-quality fraudster,” 
he said. “He could have bankrupted your 
obscure planet. You were taken in by the 
‘Parking via quantum mechanics’ scam. This 
scam works only with entities at or below the 
level of business acumen of the primitives 
from your planet who traded prime Manhat- 
tan real estate for a handful of transistors. 
This is not even his parking lot; it belongs to 
the Omniverse Shopping Mall. 

“We will send you and your cars home, and 
sever your first-order connection with the 
multiverse. Get in contact with us when you 
figure out what bitcoin mining really does.’ 

He vanished. 

Aland I waited to be transferred home, 
having learnt to be careful of new physics. = 


James Anderson first subscribed to 
Astounding Science Fiction at the age of 12. 
His day job is professor of cognitive science 
at Brown University, where he constructs 
neural network models for cognition. 


ILLUSTRATION BY JACEY 


